Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman healthcheck + sdnotify: Error: container is stopped #22760

Closed
edsantiago opened this issue May 20, 2024 · 3 comments · Fixed by #22764
Closed

podman healthcheck + sdnotify: Error: container is stopped #22760

edsantiago opened this issue May 20, 2024 · 3 comments · Fixed by #22764
Assignees
Labels
flakes Flakes from Continuous Integration jira locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

<+015ms> # # podman run --name IK28z8EJtw --health-cmd=touch /terminate --sdnotify=healthy quay.io/libpod/testimage:20240123 sh -c while test \! -e /terminate; do sleep 0.1; done; echo finished
<+572ms> # finished
         # Error: container is stopped
<+004ms> # [ rc=126 (** EXPECTED 0 **) ]

Reproducer fails within seconds on 1mt:

# while :;do bin/podman run --health-cmd="touch /terminate" --sdnotify=healthy quay.io/libpod/testimage:20240123 sh -c "while test \! -e /terminate; do sleep 0.1; done; echo finished" || break; done
finished
finished
finished
finished
finished
finished
finished
finished
finished
finished
Error: container is stopped

While I'm at it, this is probably a bug too: (above command, with run --rm, fails instantly):

# bin/podman run --rm --health-cmd="touch /terminate" --sdnotify=healthy quay.io/libpod/testimage:20240123 sh -c "while test \! -e /terminate; do sleep 0.1; done; echo finished"
finished
# echo $?
127

Almost certainly related to #22658. @giuseppe PTAL. Only seen on aarch64, but that's consistent with the previous flake that you fixed in your PR.

x x x x x x
sys(2) podman(2) fedora-40-aarch64(2) root(2) host(2) sqlite(2)
@edsantiago edsantiago added the flakes Flakes from Continuous Integration label May 20, 2024
@edsantiago
Copy link
Member Author

OBTW vim /usr/share/containers/storage.conf and comment out the thinpool line, otherwise lots of nasty warnings

@giuseppe
Copy link
Member

thanks for the report. The --sdnotify=healthy feature has a race condition in its implementation as we release the lock on the container middle way https://github.com/containers/podman/blob/main/libpod/container_internal.go#L1316-L1323. When the lock is released, the cleanup process deletes the container. I am not sure yet how this can be solved, except trying to not report "container is stopped" as an error.
The issue is even more evident with --rm since the container is gone once the lock is released (and the cleanup process was faster). Maybe the easiest for now is to just disallow --rm and --sdnotify=healthy.

@giuseppe
Copy link
Member

opened a tentative PR: #22764

Marked as a Draft as I want to test it better. It doesn't solve the root issue as it would still fail if the healthcheck change takes much longer than a waiting interval

@giuseppe giuseppe added the jira label May 21, 2024
giuseppe added a commit to giuseppe/libpod that referenced this issue May 22, 2024
wait for another interval when the container transitioned to "stopped"
to give more time to the healthcheck status to change.

Closes: containers#22760

Signed-off-by: Giuseppe Scrivano <[email protected]>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 27, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Aug 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration jira locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants