Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup Mimir's monitoring #3162

Closed
Tracked by #3039
QuentinBisson opened this issue Jan 22, 2024 · 5 comments
Closed
Tracked by #3039

Setup Mimir's monitoring #3162

QuentinBisson opened this issue Jan 22, 2024 · 5 comments
Assignees

Comments

@QuentinBisson
Copy link

QuentinBisson commented Jan 22, 2024

Towards #3039

In order to ease debugging/observability of Mimir, we should provide the official Mimir dashboards and alerts from mixins.

Mixins are here: https://github.com/grafana/mimir/tree/main/operations/mimir-mixin

A compiled version is available here: https://github.com/grafana/mimir/tree/main/operations/mimir-mixin-compiled/dashboards

Also, let's make sure all components are monitored via service monitors

Related issues:

@QuantumEnigmaa
Copy link

We decided not to implement all mimir mixins dashboard at once but rather to add the most useful ones one by one first and to add the less important ones later when we'll have some time.

Currently 2 things need to be fixed for the mimir writes dashboard (which is the first one to be implemented) for it to be working in our setup :

  • Some graphs' queries will need to have a min interval set to 1~2min in the query options to work
  • For some reason, a lot of queries using mimir mixins recording rules are not working even though those recording rules are deployed on the cluster thx to the prometheus-rules app

@QuantumEnigmaa
Copy link

QuantumEnigmaa commented Feb 29, 2024

Having set up a scrape interval of 30s for mimir's service monitors, now all graphs are working. Including the write dashboard, in my opinion, those are the ones we should add in priority :

  • writes
  • writes-resources
  • reads
  • reads-resources
  • ruler
  • compactor
  • compactor-resources

And maybe overview and overview-resources

@QuentinBisson
Copy link
Author

I would like if we could have all tbh so maybe you can list the dashboard in another issue and we add the main ones now only? Maybe we can add the compactor ones as well here

@QuantumEnigmaa
Copy link

I discovered that Mimir components' disk usage metrics are not exposed by default, meaning we are blind on this aspect of the app and that the related graphs on the dashboards will be empty. As this might require additional work from our side and not being a priority for now, I created a dedicated issue for this here.

@QuantumEnigmaa
Copy link

All most needed dashboards are now deployed. A follow-up issue has been created to track the few remaining graphs that are lacking data : https://github.com/giantswarm/giantswarm/issues/30218

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants