Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout to container.exec #218

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

amandahla
Copy link

Issue

For some reason, alertmanager and prometheus pods are getting constantly restarted and we couldnt figured out why.

Versions:
Juju 3.1.6
Alertmanager latest/edge 98
Prometheus latest/edge 170

Solution

I was not able to put my finger on it but this PR just adds timeout to the container.exec calls to make sure that this is not leaking and making pebble unresponsive.

Context

Pebble

Testing Instructions

No instructions. It happens randomly.

Release Notes

@sed-i
Copy link
Contributor

sed-i commented Feb 16, 2024

Thanks @amandahla, this seems like a good direction.
I wonder what happens when the timeout duration is elapsed? (See outlinks in canonical/operator#1131.)
If it raises then the charm will go in error state, which might be ok.

@lucabello
Copy link
Contributor

What's the status of this? Is this something we need/want to merge? @sed-i

@amandahla
Copy link
Author

Sorry for the delay.
From what I checked, once the timeout is reached, it will throw an exception "ops.pebble.ChangeError".

This can also be confirmed by this test:
https://github.com/canonical/operator/blob/449793f9867eba11dc2e7e1c28ea3dca1e1da231/test/test_real_pebble.py#L214

@amandahla amandahla requested a review from a team as a code owner August 2, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants