Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unflake thedropped_client_doesnt_create_duplicate_carbons test #4437

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

gustawlippa
Copy link
Contributor

@gustawlippa gustawlippa commented Dec 13, 2024

The description of #4423 was not entirely correct, and this PR should unlflake the tests for good. The reason why the tests are changed, and not the implementation is because the underlying race condition is a bit tricky to fix elegantly.

mod_carboncopy:remove_connection is a handler for the unset_presence hook, and it removes CC state from the ejabberd_sm info field in the session record. This is however done in an async manner by the C2S process, and when the user terminates, the request is never handled. This doesn't matter much, as the user process exits shortly, and is removed from ejabberd_sm altogether. However, if they receives a message in this short time, CC might run, and see the process still with the CC state in the ejabberd_sm info field. This results in a CC copy sent on behalf of this exiting process. Since a fix would be overly complicated, and to proceed with unflaking the tests, the test will now wait for C2S process exit.

GH Actions run manually: https://github.com/esl/MongooseIM/actions/runs/12318614146/job/34383742037

Unfortunately, the underlying race condition is a bit tricky to fix elegantly.

`mod_carboncopy:remove_connection` is a handler for the `unset_presence` hook,
and it removes CC state from the ejabberd_sm info field in the session record.
This is however done in an async manner by the C2S process, and when the user
terminates, the request is never handled. This doesn't matter much, as the user
process exits shortly, and is removed from ejabberd_sm altogether. However, if
it receives a message in this short time, CC might run, and see the process
still with the CC state in the ejabberd_sm info field. This results in a CC copy
sent on behalf of this exiting process. Since a fix would be overly complicated,
and to proceed with unflaking the tests, the test will now wait for C2S process
exit.
@mongoose-im
Copy link
Collaborator

mongoose-im commented Dec 13, 2024

elasticsearch_and_cassandra_27 / elasticsearch_and_cassandra_mnesia / 4d7b462
Reports root/ big
OK: 472 / Failed: 0 / User-skipped: 49 / Auto-skipped: 0


small_tests_26 / small_tests / 4d7b462
Reports root / small


small_tests_27 / small_tests / 4d7b462
Reports root / small


small_tests_27_arm64 / small_tests / 4d7b462
Reports root / small


ldap_mnesia_26 / ldap_mnesia / 4d7b462
Reports root/ big
OK: 2353 / Failed: 0 / User-skipped: 912 / Auto-skipped: 0


ldap_mnesia_27 / ldap_mnesia / 4d7b462
Reports root/ big
OK: 2353 / Failed: 0 / User-skipped: 912 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 4761 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


dynamic_domains_mysql_redis_27 / mysql_redis / 4d7b462
Reports root/ big
OK: 4726 / Failed: 0 / User-skipped: 154 / Auto-skipped: 0


internal_mnesia_27 / internal_mnesia / 4d7b462
Reports root/ big
OK: 2495 / Failed: 0 / User-skipped: 770 / Auto-skipped: 0


pgsql_cets_27 / pgsql_cets / 4d7b462
Reports root/ big
OK: 4850 / Failed: 0 / User-skipped: 188 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_27 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 4761 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462
Reports root/ big
OK: 4756 / Failed: 0 / User-skipped: 124 / Auto-skipped: 0


mysql_redis_27 / mysql_redis / 4d7b462
Reports root/ big
OK: 5130 / Failed: 0 / User-skipped: 149 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 5151 / Failed: 0 / User-skipped: 128 / Auto-skipped: 0


pgsql_mnesia_27 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 5151 / Failed: 0 / User-skipped: 128 / Auto-skipped: 0


cockroachdb_cets_27 / cockroachdb_cets / 4d7b462
Reports root/ big
OK: 4850 / Failed: 0 / User-skipped: 188 / Auto-skipped: 0


mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462
Reports root/ big
OK: 5146 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


elasticsearch_and_cassandra_27 / elasticsearch_and_cassandra_mnesia / 4d7b462
Reports root/ big
OK: 472 / Failed: 0 / User-skipped: 49 / Auto-skipped: 0


small_tests_26 / small_tests / 4d7b462
Reports root / small


small_tests_27 / small_tests / 4d7b462
Reports root / small


small_tests_27_arm64 / small_tests / 4d7b462
Reports root / small


ldap_mnesia_27 / ldap_mnesia / 4d7b462
Reports root/ big
OK: 2353 / Failed: 0 / User-skipped: 912 / Auto-skipped: 0


dynamic_domains_mysql_redis_27 / mysql_redis / 4d7b462
Reports root/ big
OK: 4726 / Failed: 0 / User-skipped: 154 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / 4d7b462
Reports root/ big
OK: 2353 / Failed: 0 / User-skipped: 912 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 4761 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_27 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 4761 / Failed: 0 / User-skipped: 119 / Auto-skipped: 0


internal_mnesia_27 / internal_mnesia / 4d7b462
Reports root/ big
OK: 2495 / Failed: 0 / User-skipped: 770 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462
Reports root/ big
OK: 4756 / Failed: 0 / User-skipped: 124 / Auto-skipped: 0


pgsql_cets_27 / pgsql_cets / 4d7b462
Reports root/ big
OK: 4850 / Failed: 0 / User-skipped: 188 / Auto-skipped: 0


pgsql_mnesia_27 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 5151 / Failed: 0 / User-skipped: 128 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / 4d7b462
Reports root/ big
OK: 5151 / Failed: 0 / User-skipped: 128 / Auto-skipped: 0


cockroachdb_cets_27 / cockroachdb_cets / 4d7b462
Reports root/ big
OK: 4850 / Failed: 0 / User-skipped: 188 / Auto-skipped: 0


mysql_redis_27 / mysql_redis / 4d7b462
Reports root/ big
OK: 5130 / Failed: 0 / User-skipped: 149 / Auto-skipped: 0


mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462
Reports root/ big
OK: 5146 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0

Copy link

codecov bot commented Dec 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.42%. Comparing base (952be3c) to head (4d7b462).
Report is 32 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4437      +/-   ##
==========================================
+ Coverage   85.16%   85.42%   +0.25%     
==========================================
  Files         549      549              
  Lines       33875    33875              
==========================================
+ Hits        28849    28937      +88     
+ Misses       5026     4938      -88     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gustawlippa gustawlippa marked this pull request as ready for review December 16, 2024 09:45
@gustawlippa
Copy link
Contributor Author

The setup CI job turned red because I accidentally rerun the CircleCI job after a long time I believe - there is no issue with CircleCI, and the tests have passed in reality.

Copy link
Collaborator

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💪🏽

Comment on lines +183 to +192
C2SPid = mongoose_helper:get_session_pid(Alice2),
escalus_client:stop(Config, Alice2),
escalus:assert(is_presence_with_type, [<<"unavailable">>],
escalus_client:wait_for_stanza(Alice1)),
%% Ensure there is no session so that the test doesn't flake and fail by chance.
%% The issue comes from the fact that mod_presence sends the "unavailable" stanza
%% just before removing the is removed from ejabberd_sm. If the msg is received by
%% mod_carboncopy in this short window, it still considers the resource enabled.
%% It's a race condition, but from the user's perspective it shouldn't be a big issue.
mongoose_helper:wait_for_pid_to_die(C2SPid),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fine one, we've used this trick to ensure we wait for the pid to die many other times 👌🏽

@NelsonVides NelsonVides merged commit c92b6b6 into master Dec 17, 2024
22 of 23 checks passed
@NelsonVides NelsonVides deleted the carboncopy-flaky-test branch December 17, 2024 07:04
@jacekwegr jacekwegr added this to the 6.3.1 milestone Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants