-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression since Jan 2021 breaks ansible #346
Comments
Hi, thank you for reporting this issue. The https://github.com/Scribery/tlog/releases/tag/v12 release should already include that commit f03ff12 so I'm not exactly sure what is happening. The only difference between v12 and master are CI related things (nothing that would affect tlog functionality) Reproducer steps with exact package versions would be great. |
Sorry I guess I have the wrong idea of what the 12 release is :) I mean this commit from 20 Jan 2021 31f198c |
Okay then yes f03ff12 is most likely the culprit. If you can provide a minimal ansible reproducer that would be great. |
Reproduced at home on Fedora 34. Compiled and installed master (c23a145), set my shell to /usr/bin/tlog-rec-session. Ansible version below from Fedora 34 RPM but also earlier 2.8.
Pipelining enabled (ansible.cfg in directory running playbook from)
Simple playbook that runs a command and copies a file
Run the playbook.
And it's stuck. Here we can the ansible controller process running, and the connection over local SSH.
This reproduces what I saw earlier when it was getting stuck specifically on a copy task but some other tasks were running OK. SSH pipelining was enabled. Interestingly, when I remove SSH pipelining, the first command doesn't work either.
|
This should be fixed in the latest release |
Hi @justin-stephenson I've just encountered the same or similar issue on a Ubuntu 20.04 host, using a package built from master a few weeks ago. I'll try to get another reproducer together. To test I was running "dmesg" using ansible in a loop, it got stuck on the 9th iteration. Pipelining enabled. |
@ajf8 Okay thanks for letting me know, I was hoping this was taken care of once and for all. Do you see it fail frequently? I'll try to reproduce on my end with Fedora/RHEL |
I also ran into this issue on a number of machines. After taking a closer look, this seems to be some kind of race condition, as it seems to be somewhat time-dependant. Running:
(hangs infinitely or until Return is pushed) By adding a sleep, it starts to work:
Exits after a second, as expected. This only seems to happen on very fast (64+ modern physical cores) machines, which might be related to why I couldn't reproduce this in a VM. I have attached GDB to the process and taken a backtrace in the hung state, which looks as follows:
Running strace on the hung process looks like this:
until after a while this happens:
Pressing Return in the hung ssh connection leads to this trace:
Any ideas on what could cause this? Thank you! |
Hi!
I've been running the 12 release for a while on many hosts without issue. Recently upgraded to master and running ansible on a host with tlog enabled was quite unstable with tasks hanging. There's defunct processes like sleep under ansible. Downgrading to 12 release fixes the issue.
OS is CentOS 8.4. Pipelining is enabled on the SSH connection in ansible. It doesn't seem to happen on all ansible tasks, but soon after the upgrade it was pretty frequent.
Looking at the changes since the 12, I believe this to be the most likely cause of the regression. Could the void packet be missing and therefore not breaking out of the recording loop?
f03ff12
I will try to get a reproducer together.
The text was updated successfully, but these errors were encountered: