-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman start --filter restart-policy=always : container state improper #23246
Comments
FYI reproduced locally, I didn't add a timer in my loop so not sure how long it took but it was somehwere around 30-60 mins
I patched the binary to show the current state and it was |
I think I see the issue given that, our code does: My fix would be to move the running state check into the locked Start() function |
I feel like locking the existing check would be most correct to not make Start() more complicated than it is right now, but it would also add an extra lock/unlock so I won't argue too hard |
That wouldn't work because we must stay locked for both checks, if we unlock before then it again open ups the window for another state change. |
Ew. Maybe add a ErrCtrRunning, return that if the container is in running | paused state from Start, ignore it at the caller? |
I need to take a look tomorrow but if all callers then have to ignore it just complicates the code. But if some code paths want this to fail then a typed error sounds good to me. |
#23258 |
The current code did something like this: lock() getState() unlock() if state != running lock() getState() == running -> error unlock() This of course is wrong because between the first unlock() and second lock() call another process could have modified the state. This meant that sometimes you would get a weird error on start because the internal setup errored as the container was already running. In general any state check without holding the lock is incorrect and will result in race conditions. As such refactor the code to combine both StartAndAttach and Attach() into one function that can handle both. With that we can move the running check into the locked code. Also use typed error for this specific error case then the callers can check and ignore the specific error when needed. This also allows us to fix races in the compat API that did a similar racy state check. This commit changes slightly how we output the result, previously a start on already running container would never print the id/name of the container which is confusing and sort of breaks idempotence. Now it will include the output except when --all is used. Then it only reports the ids that were actually started. Fixes containers#23246 Signed-off-by: Paul Holzinger <[email protected]>
Looks related to #22914, although probably not a regression (that one was a reliable panic, this is a flaky podman error):
For extra credit, maybe someone could fix that error message to indicate the actual current container state
The text was updated successfully, but these errors were encountered: