Skip to content

Commit

Permalink
[mixin/alerts]: Enable configuring job prefix for alerts to prevent c…
Browse files Browse the repository at this point in the history
…lashes with metrics from Loki/Tempo
  • Loading branch information
mtweten committed Oct 17, 2024
1 parent 16f7f5f commit 7c58e37
Show file tree
Hide file tree
Showing 7 changed files with 11 additions and 7 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
### Mixin

* [ENHANCEMENT] Unify ingester autoscaling panels on 'Mimir / Writes' dashboard to work for both ingest-storage and non-ingest-storage autoscaling. #9617
* [ENHANCEMENT] Alerts: Enable configuring job prefix for alerts to prevent clashes with metrics from Loki/Tempo. #9659
* [BUGFIX] Dashboards: Fix autoscaling metrics joins when series churn. #9412 #9450 #9432
* [BUGFIX] Alerts: Fix autoscaling metrics joins in `MimirAutoscalerNotActive` when series churn. #9412
* [BUGFIX] Alerts: Exclude failed cache "add" operations from alerting since failures are expected in normal operation. #9658
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,7 @@ spec:
expr: |
max by (cluster, namespace) (memberlist_client_cluster_members_count)
>
(sum by (cluster, namespace) (up{job=~".+/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
(sum by (cluster, namespace) (up{job=~".*/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
for: 20m
labels:
severity: warning
Expand Down
2 changes: 1 addition & 1 deletion operations/mimir-mixin-compiled-baremetal/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,7 @@ groups:
expr: |
max by (cluster, namespace) (memberlist_client_cluster_members_count)
>
(sum by (cluster, namespace) (up{job=~".+/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
(sum by (cluster, namespace) (up{job=~".*/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
for: 20m
labels:
severity: warning
Expand Down
2 changes: 1 addition & 1 deletion operations/mimir-mixin-compiled/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -525,7 +525,7 @@ groups:
expr: |
max by (cluster, namespace) (memberlist_client_cluster_members_count)
>
(sum by (cluster, namespace) (up{job=~".+/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
(sum by (cluster, namespace) (up{job=~".*/(admin-api|alertmanager|compactor.*|distributor.*|ingester.*|querier.*|ruler|ruler-querier.*|store-gateway.*|cortex|mimir|mimir-write.*|mimir-read.*|mimir-backend.*)"}) + 10)
for: 20m
labels:
severity: warning
Expand Down
4 changes: 2 additions & 2 deletions operations/mimir-mixin/alerts/alerts-utils.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@
$._config.product + name,

jobMatcher(job)::
'job=~".*/%s"' % formatJobForQuery(job),
'%s=~"%s%s"' % [$._config.per_job_label, $._config.alert_job_prefix, formatJobForQuery(job)],

jobNotMatcher(job)::
'job!~".*/%s"' % formatJobForQuery(job),
'%s!~"%s%s"' % [$._config.per_job_label, $._config.alert_job_prefix, formatJobForQuery(job)],

local formatJobForQuery(job) =
if std.isArray(job) then '(%s)' % std.join('|', job)
Expand Down
4 changes: 2 additions & 2 deletions operations/mimir-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -779,8 +779,8 @@ local utils = import 'mixin-utils/utils.libsonnet';
|||
max by (%s) (memberlist_client_cluster_members_count)
>
(sum by (%s) (up{%s=~".+/%s"}) + 10)
||| % [$._config.alert_aggregation_labels, $._config.alert_aggregation_labels, $._config.per_job_label, simpleRegexpOpt($._config.job_names.ring_members)],
(sum by (%s) (up{%s}) + 10)
||| % [$._config.alert_aggregation_labels, $._config.alert_aggregation_labels, $.jobMatcher($._config.job_names.ring_members)],
'for': '20m',
labels: {
severity: 'warning',
Expand Down
3 changes: 3 additions & 0 deletions operations/mimir-mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,9 @@
// Used to add extra annotations to all alerts, Careful: takes precedence over default annotations.
alert_extra_annotations: {},

// Used as the job prefix in alerts that select on job label (e.g. GossipMembersTooHigh, RingMembersMismatch). This can be set to a known namespace to prevent those alerts from firing incorrectly due to selecting similar metrics from Loki/Tempo.
alert_job_prefix: '.*/',

// Whether alerts for experimental ingest storage are enabled.
ingest_storage_enabled: true,

Expand Down

0 comments on commit 7c58e37

Please sign in to comment.