-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Atlas rules compatible with Mimir #1102
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,20 +28,20 @@ spec: | |
# If a prometheus is missing, this alert will fire. This alert will not check if a prometheus is running when it should not (e.g. deleted cluster) | ||
expr: | | ||
( | ||
sum by(cluster_id) ( | ||
sum by(cluster_id, installation, provider, pipeline) ( | ||
{__name__=~"cluster_service_cluster_info|cluster_operator_cluster_status", status!="Deleting"} | ||
) unless sum by(cluster_id) ( | ||
) unless sum by(cluster_id, installation, provider, pipeline) ( | ||
label_replace( | ||
kube_pod_container_status_running{container="prometheus", namespace!="{{ .Values.managementCluster.name }}-prometheus", namespace=~".*-prometheus"}, | ||
"cluster_id", "$2", "pod", "(prometheus-)(.+)(-.+)" | ||
) | ||
) | ||
) + ( | ||
sum by (cluster_name) ( | ||
) or ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That query should not be worked as cluster_name does not exist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would it not work? cluster_name came from the label replace so it did work and we tested it with herve. Also why was this changed from a + to an or? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. label_replace is missing in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
sum by (name, installation, provider, pipeline) ( | ||
capi_cluster_status_phase{phase!="Deleting"} | ||
) unless sum by (cluster_name) ( | ||
label_replace(kube_pod_container_status_running{container="prometheus",namespace=~".*-prometheus"}, | ||
"cluster_name", "$2", "pod", "(prometheus-)(.+)(-.+)" | ||
) unless sum by (name, installation, provider, pipeline) ( | ||
label_replace(kube_pod_container_status_running{container="prometheus",namespace=~".*-prometheus"}, | ||
"name", "$2", "pod", "(prometheus-)(.+)(-.+)" | ||
) | ||
) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently Loki ServiceMonitor is scraping every 15s so that change is not mandatory.
But we should change the scrapeInterval to 1m IMO and so it's safer to have that change now...