-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moderated message lost if someone clicks on "distribute" link twice in quick succession, #1886
Comments
Hi @dpc22,
|
I think that the only significant customization is Steve Shipley's modlist plugin. I'm not in a position to disable this on our live lists servers as (literally) thousands of list owners depend on this. Other that this there are mostly fixes that you yourself have proposed to problems that I have had with 6.2.72, namely: I do have a test server which is fair game, but I am unable to reproduce the issue at the moment either on the live server or the test server, which unfortunately is often the case with awkward race conditions. I attached the logs to the original bug report as a text attachment "sympa.txt". Everything happens in the space of 4 seconds, and appears to be a consequence of my user clicking a link twice in the moderation message that I quoted:
Here are the corresponding entries in my Apache access_log, which correspond to log entries in the sympa.txt log.
At the moment the obvious smoking gun is the following file in /var/spool/sympa/moderation/ -rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute which is the missing message. This file is still there 23 hours later, and remains the only file with a ".distribute" suffix in that directory. I don't know if these .distribute files time out after a certain number of days or weeks. The fact that there is only one of them feels significant. What we are looking at is not normal on a busy Sympa server with 8000 active lists. |
By undoing all these customizations and modifications It would be more helpful for other users to try to reproduce the problem. But that is not absolutely necessary. In any case, you need to reproduce the problem on your end to get a clue as to how to solve it.
If you saw this log and decided that it was a sign that "someone clicks on that link twice in rapid succession", you are wrong. When the user follows on the Visiting the link and clicking on the button will log
Renaming |
I just tried this on a test list and there was no confirmation dialogue. The following URL: https://lists.cam.ac.uk/sympa/distribute/test-dpc22/ebe7220098759658fd75ad6d302bd450 flashed up a message "distribute: action will complete soon", and then showed me the output of: https://lists.cam.ac.uk/sympa/modindex/test-dpc22 without any HTTP redirect. In fact I ran the same test before I submitted the original ticket to confirm that there wasn't a confirmation dialogue, before I asserted that this was the case there. Is is possible this behaviour is influenced by a setting in sympa.conf or a scenari rule? The only custom scenari that we have are send.* rules. We do use web_sso in /etc/sympa/auth.conf (Shibboleth), but I do see the same thing if I use a username and password to authenticate instead. |
|
I see a different message in my Web browser if the message was already distributed:
(I think that is what you are asking here).
I think that you have all the log output that I have now: the sympa.txt attachment at the bottom of the original bug report, plus the two lines from the Apache access_log later on. I'm happy to ramp up logging on my test server or add custom logging to either the test or live system, but that probably isn't going to get us very far if I can't reproduce an issue reported by a random list owners. I don't think that this can be common given that this is the first time that a list owner has reported a problem to me. |
No. As long as the operation was performed at different times and under different conditions, it is impossible for the exact same log to be output. Unless objective information is provided that can reproduce the problem, it cannot be investigated for resolution. I'm going to bed for now. Please continue to keep us informed if you like. |
I wrote:
This was my misunderstanding. Strangely, Sympa currently confirms the Anyway, it is normal behaviour that "Unable to access the message" is reported on the second click. The anomaly is that the first click has not initiated message delivery. More information is awaited. |
I only have a single operation which led to a missing message, and I attached the logs relating to that event to this ticket. The only part that I elided was the original message delivery which generated the moderation request, which I can include if you like. The list moderator (who was the same person to send the message) clicked on the "distribute" link in the moderation message that she received 27 seconds later, and again 4 seconds after that. I agree that we need a reproducible test case, but as I indicated yesterday I'm not having any success in generating one. The only obvious curiosity that I have to work with is the following in /var/spool/sympa/moderation/ -rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute which is still there two days later. This remains the only ".distribute" file in that directory, on a busy Sympa system. If I have some spare time later today I will trace through the code and see if I can work out what generated that specific file, which appears to have been orphaned. |
FWIW, a successful moderation cycle on the same list a few hours later looks like the following:
compared to the one that failed, which has a repeated do_distribute action, and two different errrors:
"Unable to find message with <..." is the normal message for repeated do_distribute actions, leading to the error screen that we were both expecting to see. "Unable to find message with key <..." (note "with key") is the only instance of that error in the last 30 days of logs. The key corresponds to the file in /var/spool/sympa/moderation/ -rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute |
Okay, the Sympa::Spool stuff is causing my brain to leak out of my ears,. I do now understand that wwsympa do_distribute() renames the file in the moderation spool to add a ".distribute" suffix, which should stop repeat distribute actions on the same key. So that is where that file came from. It then injects "QUIET DISTRIBUTE ucam-language-sciences e9f3e269192343c654798f6681a91f37" into the sympa incoming mail queue to be processed by the sympa_msg daemon. I don't understand why lib/Sympa/Request/Handler/distribute.pm
failed to find the following file in /var/spool/sympa/moderation: -rw-r----- 1 sympa sympa 17859 2024-09-02 09:29:08.000000000 +0100 [email protected]_e9f3e269192343c654798f6681a91f37.distribute with the following error:
given lib/Sympa/Spool/Moderation.pm
localpart, domainpart, AUTHKEY all seem to match, and perldoc on that module tells me:
We also have the following, which seems to match:
|
Version
6.2.72
Installation method
My own rpm, derived from "official" rpm for RHEL 9.
Expected behavior
Moderation messages contain some some text of the form:
If someone clicks on that link twice in rapid succession, I would expect the first click to distribute the message, and the second click to generate an error page.
Actual behavior
The user reported the error page, but the message was not distributed.
I did find the following in /var/spool/sympa/moderation/
-rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute
I don't know what these ".distribute" files are used for, but this is the only file with that suffix in /var/spool/sympa/moderation on a busy server, and it is still there 7 hours later. So I think and hope that this might be a smoking gun for whatever went wrong.
Additional information
Here are the two incoming do_distribute requests for e9f3e269192343c654798f6681a91f37
Sep 2 09:29:36 lists-1 wwsympa[3097526]: info main::do_distribute(e9f3e269192343c654798f6681a91f37) [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]
Sep 2 09:29:40 lists-1 wwsympa[3097453]: info main::do_distribute(e9f3e269192343c654798f6681a91f37) [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]
sympa_msg generated the following error at 09:29:40
Sep 2 09:29:40 lists-1 sympa_msg[1946661]: err main::#247 > Sympa::Spindle::spin#95 > Sympa::Spindle::DoCommand::_twist#120 > Sympa::Spindle::spin#95 > Sympa::Request::Handler::distribute::_twist#61 Unable to find message with key for list Sympa::List [email protected]
That is the only time that "Unable to find message with key" appears in my logs in the last 28 days. (I have lots of "Unable to find message with" from wwsympa, but nothing similar from sympa_msg).
This feels very much like a race condition when wwsympa and sympa_msg are trying to work on the same moderation files at the same time. I hope that the single "Unable to find message with key" error combined with the single distribute file in /var/spool/sympa/moderation/ will be enough to narrow down a cause.
I will attach the full logs as a text attachment to this ticket
sympa.txt
The text was updated successfully, but these errors were encountered: