-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unflake thedropped_client_doesnt_create_duplicate_carbons
test
#4437
Conversation
Unfortunately, the underlying race condition is a bit tricky to fix elegantly. `mod_carboncopy:remove_connection` is a handler for the `unset_presence` hook, and it removes CC state from the ejabberd_sm info field in the session record. This is however done in an async manner by the C2S process, and when the user terminates, the request is never handled. This doesn't matter much, as the user process exits shortly, and is removed from ejabberd_sm altogether. However, if it receives a message in this short time, CC might run, and see the process still with the CC state in the ejabberd_sm info field. This results in a CC copy sent on behalf of this exiting process. Since a fix would be overly complicated, and to proceed with unflaking the tests, the test will now wait for C2S process exit.
elasticsearch_and_cassandra_27 / elasticsearch_and_cassandra_mnesia / 4d7b462 small_tests_26 / small_tests / 4d7b462 small_tests_27 / small_tests / 4d7b462 small_tests_27_arm64 / small_tests / 4d7b462 ldap_mnesia_26 / ldap_mnesia / 4d7b462 ldap_mnesia_27 / ldap_mnesia / 4d7b462 dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 4d7b462 dynamic_domains_mysql_redis_27 / mysql_redis / 4d7b462 internal_mnesia_27 / internal_mnesia / 4d7b462 pgsql_cets_27 / pgsql_cets / 4d7b462 dynamic_domains_pgsql_mnesia_27 / pgsql_mnesia / 4d7b462 dynamic_domains_mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462 mysql_redis_27 / mysql_redis / 4d7b462 pgsql_mnesia_26 / pgsql_mnesia / 4d7b462 pgsql_mnesia_27 / pgsql_mnesia / 4d7b462 cockroachdb_cets_27 / cockroachdb_cets / 4d7b462 mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462 elasticsearch_and_cassandra_27 / elasticsearch_and_cassandra_mnesia / 4d7b462 small_tests_26 / small_tests / 4d7b462 small_tests_27 / small_tests / 4d7b462 small_tests_27_arm64 / small_tests / 4d7b462 ldap_mnesia_27 / ldap_mnesia / 4d7b462 dynamic_domains_mysql_redis_27 / mysql_redis / 4d7b462 ldap_mnesia_26 / ldap_mnesia / 4d7b462 dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 4d7b462 dynamic_domains_pgsql_mnesia_27 / pgsql_mnesia / 4d7b462 internal_mnesia_27 / internal_mnesia / 4d7b462 dynamic_domains_mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462 pgsql_cets_27 / pgsql_cets / 4d7b462 pgsql_mnesia_27 / pgsql_mnesia / 4d7b462 pgsql_mnesia_26 / pgsql_mnesia / 4d7b462 cockroachdb_cets_27 / cockroachdb_cets / 4d7b462 mysql_redis_27 / mysql_redis / 4d7b462 mssql_mnesia_27 / odbc_mssql_mnesia / 4d7b462 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4437 +/- ##
==========================================
+ Coverage 85.16% 85.42% +0.25%
==========================================
Files 549 549
Lines 33875 33875
==========================================
+ Hits 28849 28937 +88
+ Misses 5026 4938 -88 ☔ View full report in Codecov by Sentry. |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 💪🏽
C2SPid = mongoose_helper:get_session_pid(Alice2), | ||
escalus_client:stop(Config, Alice2), | ||
escalus:assert(is_presence_with_type, [<<"unavailable">>], | ||
escalus_client:wait_for_stanza(Alice1)), | ||
%% Ensure there is no session so that the test doesn't flake and fail by chance. | ||
%% The issue comes from the fact that mod_presence sends the "unavailable" stanza | ||
%% just before removing the is removed from ejabberd_sm. If the msg is received by | ||
%% mod_carboncopy in this short window, it still considers the resource enabled. | ||
%% It's a race condition, but from the user's perspective it shouldn't be a big issue. | ||
mongoose_helper:wait_for_pid_to_die(C2SPid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fine one, we've used this trick to ensure we wait for the pid to die many other times 👌🏽
The description of #4423 was not entirely correct, and this PR should unlflake the tests for good. The reason why the tests are changed, and not the implementation is because the underlying race condition is a bit tricky to fix elegantly.
mod_carboncopy:remove_connection
is a handler for theunset_presence
hook, and it removes CC state from theejabberd_sm
info
field in the session record. This is however done in an async manner by the C2S process, and when the user terminates, the request is never handled. This doesn't matter much, as the user process exits shortly, and is removed fromejabberd_sm
altogether. However, if they receives a message in this short time, CC might run, and see the process still with the CC state in theejabberd_sm
info
field. This results in a CC copy sent on behalf of this exiting process. Since a fix would be overly complicated, and to proceed with unflaking the tests, the test will now wait for C2S process exit.GH Actions run manually: https://github.com/esl/MongooseIM/actions/runs/12318614146/job/34383742037