Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust cadvisor container settings #57

Merged
merged 3 commits into from
Jan 17, 2024

Conversation

drmatthews
Copy link
Contributor

@drmatthews drmatthews commented Jan 15, 2024

According to the cadvisor docs when running the container on a RedHat 7 host it should be run as privileged. The mounted volumes have been adjusted to include /sys/fs/cgroup/cpu,cpuacct:/sys/fs/cgroup/cpuacct,cpu. However, the container still shows as unhealthy when running the molecule tests.

On the monitoring host (mirsg-linux) the deployed container is also showing as unhealthy but the logs have lots of entries like:

W0111 18:01:59.875044       1 container.go:549] Failed to update stats for container "/docker/fbbd1164db6eaf5e18d046532fdbda77ba4ea54a4e676f46eab939370804d4f1": failed to parse memory.usage_in_bytes - read /sys/fs/cgroup/memory/docker/fbbd1164db6eaf5e18d046532fdbda77ba4ea54a4e676f46eab939370804d4f1/memory.usage_in_bytes: no such device, continuing to push stats

The above error is generated because the container it refers to no longer exists (fbbd1164db6e) so collecting metrics fails; the cadvisor container however continues to run but is indicating unhealthy. Recreating the cadvisor container (stopping and removing manually and then redeploying with Ansible) results in an unhealthy status but with no errors in the logs.

@drmatthews drmatthews marked this pull request as ready for review January 16, 2024 18:22
@drmatthews drmatthews requested a review from p-j-smith January 16, 2024 18:22
Copy link
Contributor

@p-j-smith p-j-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

just to check, even with these changes the container is still unhealthy? Should we look for an alternative to cadvisor?

@drmatthews
Copy link
Contributor Author

just to check, even with these changes the container is still unhealthy?

Apologies for being unclear. Yes, still unhealthy in the tests. The actual deployed container is also still unhealthy but I managed to get rid of the error messages in the logs.

Should we look for an alternative to cadvisor?

I'll create an issue. Right now it isn't really doing anything other than collecting metrics from the various prometheus related containers (which we're not alerting on).

@drmatthews drmatthews merged commit 57fe11d into main Jan 17, 2024
3 checks passed
@drmatthews drmatthews deleted the drmatthews-fix-cadvisor-container branch January 17, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants