Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mimir ruler mixin dashboard #451

Merged
merged 4 commits into from
Mar 7, 2024
Merged

Conversation

QuantumEnigmaa
Copy link
Contributor

Towards giantswarm/roadmap#3162

This PR adds the mixin dashboard for mimir-ruler.

Checklist

  • Update changelog in CHANGELOG.md in an end-user friendly language.

@QuantumEnigmaa QuantumEnigmaa self-assigned this Mar 4, 2024
@QuantumEnigmaa QuantumEnigmaa requested a review from a team as a code owner March 4, 2024 14:45
@QuantumEnigmaa
Copy link
Contributor Author

Most of the graphs are not working :
image

@QuentinBisson
Copy link
Contributor

Did we enable the ruler's service monitor?

@QuantumEnigmaa
Copy link
Contributor Author

Did we enable the ruler's service monitor?

Well, the mimir-ruler SM is present on the cluster

@QuentinBisson
Copy link
Contributor

Do we have the ruler recording rules? SOrry I've been playing with alerts all day, my brain is fried like chicken

@QuantumEnigmaa
Copy link
Contributor Author

Do we have the ruler recording rules?

Yes, the ruler is working fine :

ts=2024-03-04T15:10:35.364014431Z caller=ruler.go:564 level=info msg="syncing rules" reason=periodic

The issue here is coming from the queries used in the graphs. Those are not even recording rules, but they're the weird cortex prefixed queries that mimir is using for all of its mixins (for example : cortex_prometheus_rule_evaluations_total here)
As I mentioned earlier I'm really troubled with those queries because I don't get where they're coming from. I'm gonna ask in the grafana slack

@QuentinBisson
Copy link
Contributor

Makes sense to Ask upstream. It's possible the mixins are not UP to date with the exposed metrics

@QuantumEnigmaa
Copy link
Contributor Author

Magic happened after golem upgrade :
image

Only 2 graphs are still missing some data :
image

However I have some concerns about the datasource. Those screenshots were taken with the default datasource (i.e prometheus). But whenever I switch it to mimir we don't have the same graphs :
image

@QuantumEnigmaa
Copy link
Contributor Author

Considering the graphs with no data displayed, this is due to the fact that there is actually no data. For example, the Queue length panel is acutally working :
image

But since the graph is supposed to show queues longer than 0 and there are none during the time period selected, then no data is displayed.
Moreover, since there are no Missed iterations neither, obviously no data is displayed.

So in my opinion, all good here (except for the difference I mentioned between the prometheus and mimir datasource)

@QuantumEnigmaa QuantumEnigmaa requested a review from a team March 6, 2024 16:14
@QuantumEnigmaa QuantumEnigmaa merged commit b56b861 into main Mar 7, 2024
4 checks passed
@QuantumEnigmaa QuantumEnigmaa deleted the add-mimir-ruler-dashboard branch March 7, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants