-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpod: fix broken saveContainerError() #23577
Conversation
We cannot unlock then lock again without syncing the state as this will then save a potentially old state causing very bad things, such as double netns cleanup issues. The fix here is simple move the saveContainerError() under the same lock. The comment about the re-lock is just wrong. Not doing this under the same lock would cause us to update the error after something else changed the container alreayd. Most likely this was caused by a misunderstanding on how go defer's work. Given they run Last In - First Out (LIFO) it is safe as long as out defer function is after the defer unlock() call. I think this issue is very bad and might have caused a variety of other weird flakes. As fact I am confident that this fixes the double cleanup errors. Fixes containers#21569 Also fixes the netns removal ENOENT issues seen in containers#19721. Signed-off-by: Paul Holzinger <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mheon PTAL, I am marking this as 5.2 backport candidate as I think this is rather important |
If we try to stop a contianer that is not running or paused we get an ErrCtrStateInvalid or ErrCtrStopped error. As podman stop is idempotent this is not a user visable error at all so we should also never log it in the container state. Signed-off-by: Paul Holzinger <[email protected]>
LGTM |
libpod/container_internal.go
Outdated
@@ -1300,6 +1300,9 @@ func (c *Container) start() error { | |||
|
|||
c.newContainerEvent(events.Start) | |||
|
|||
// Reset any previous errors as we managed to start the container successfully here. | |||
c.state.Error = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should happen in init() instead, as part of resetting the state from the last run of the container, but I won't argue too loud if you want to keep it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree it would fit there, I thought it may best to only do when we now started worked but thinking about it if init or start didn't work we overwrite with the new error anyway so there is no practical difference.
One small comment, LGTM otherwise, strongly agree this needs to make it into 5.2.1 |
If we manage to init/start a container successfully we should unset any previously stored state errors. Otherwise a user might be confused why there is an error in the state about some old error even though the container works/runs. Signed-off-by: Paul Holzinger <[email protected]>
/lgtm |
LGTM for the record |
/cherry-pick v5.2 |
@baude: once the present PR merges, I will cherry-pick it on top of v5.2 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
6ef3a23
into
containers:main
@baude: new pull request created: #23580 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
We cannot unlock then lock again without syncing the state as this will
then save a potentially old state causing very bad things, such as
double netns cleanup issues.
The fix here is simple move the saveContainerError() under the same
lock. The comment about the re-lock is just wrong. Not doing this under
the same lock would cause us to update the error after something else
changed the container alreayd.
Most likely this was caused by a misunderstanding on how go defer's work.
Given they run Last In - First Out (LIFO) it is safe as long as out
defer function is after the defer unlock() call.
I think this issue is very bad and might have caused a variety of other
weird flakes. As fact I am confident that this fixes the double cleanup
errors.
Fixes #21569
Also fixes the netns removal ENOENT issues seen in #19721.
libpod: do not save expected stop errors in ctr state
If we try to stop a contianer that is not running or paused we get an
ErrCtrStateInvalid or ErrCtrStopped error. As podman stop is idempotent
this is not a user visable error at all so we should also never log it
in the container state.
libpod: reset state error on start
If we manage to start a container successfully we should unset any
previously stored state errors. Otherwise a user might be confused why
there is an error in the state about some old error even though the
container works/runs.
Does this PR introduce a user-facing change?