-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: kube play, huge annotation: podman rm hangs #22246
Comments
I instrumented my no-retries PR, to dump the yaml, and here it is f39 remote root:
|
Still happening. The recent logs below include a dump of the annotation, in case it helps, but I think it won't.
|
This one is failing multiple times a day in my no-retry PR. Here's the last two weeks:
|
I'll start poking at this one. Probably something to do with the sheer size of the annotation making our REST API rather angry. |
To debug containers#22246 Signed-off-by: Paul Holzinger <[email protected]>
Based on the error here I see what is happening, the code reads stderr first until EOF then stdout until EOF. In fact this is trivial reproduce once we know this. The issue is why this is is flaky because UpdateContainerStatus is never called by default, it gets only called when the crun kill command fails which can happen if the container was already stopped/exited (which is a normal race condition because we unlock during stop) And this isn't related to remote either, it is likely that remote makes the race to hit this more likely |
There are two major problems with UpdateContainerStatus() First, it can deadlock when the the state json is to big as it tries to read stderr until EOF but it will never hit EOF as long as the runtime process is alive. This means if the runtime json is to big to git into the pipe buffer we deadlock ourselves. Second, the function modifies the container state struct and even adds and exit code to the db however when it is called from the stop() code path we will be unlocked here. While the first problem is easy to fix the second one not so much. And when we cannot update the state there is no point in reading the from runtime in the first place as such remove the function as it does more harm then good. And add some warnings the the functions that might be called unlocked. Fixes containers#22246 Signed-off-by: Paul Holzinger <[email protected]>
To debug containers#22246 Signed-off-by: Paul Holzinger <[email protected]>
To debug containers#22246 Signed-off-by: Paul Holzinger <[email protected]>
To debug containers#22246 Signed-off-by: Paul Holzinger <[email protected]>
Seeing a new flake recently, so far only in podman-remote root. Not OS-specific:
The text was updated successfully, but these errors were encountered: