This document assumes that you have already read and understood the general README. If not, start reading there.
This document contains instruction on how to deal with absence alerts.
Note: make sure that you use
"true"
and nottrue
as the value for labels mentioned below.
If you want to disable the operator for only a specific alert rule then you can add the
no_alert_on_absence
label to a specific alert rule.
Example:
alert: ImportantAlert
expr: foo_bar > 0
for: 5m
labels:
no_alert_on_absence: "true"
...
You can disable the operator for a specific PrometheusRule
resource by adding the
following label to it:
absent-metrics-operator/disable: "true"
If you disable the operator for a specific alert or a specific
PrometheusRule
resource but there are other alerts or PrometheusRule
resources which
have alert definitions that use the same metrics then the absent alert
rules for those metrics will be created regardless.
For example, considering the following rule definitions:
- alert: ImportantAlert
expr: foo_bar > 0
for: 5m
labels:
no_alert_on_absence: "true"
...
- alert: ImportantServiceAlert
expr: max(foo_bar) BY (service, region) > 0
for: 5m
labels:
...
An absence alert rule for the foo_bar
metric will be created because it is used in
ImportantServiceAlert
even though ImportantAlert
specifies the no_alert_on_absence
label.
support_group
and service
labels are a special case. We (SAP Converged Cloud) use them for
routing alert notifications to different channels.
These labels are defined using different strategies in the following order (highest to lowest priority):
- Alert rule labels: if the alert rule has the
support_group
ORservice
label and the label doesn't use templating (e.g.$labels.some_label
) then carry over that label as is. - K8s object level labels: If the
support_group
ORservice
labels are defined at the object (i.e.PrometheusRule
) level then use their values. - Most common
support_group
/service
combination: find a default value for thesupport_group
andservice
labels by traversing through all the alert rules defined in thePrometheusRule
object. Thesupport_group
ANDservice
label combination that is the most common amongst all those alerts will be used as the default. - Most common
support_group
/service
combination across the namespace: traverse through all the alert rule definitions for the concerning Prometheus server in the concerning namespace. Thesupport_group
ANDservice
label combination that is the most common amongst all those alerts will be used as the default.
If all of the above strategies fail, i.e. a value for support_group
and service
cannot
be determined, then the absence alert rules won't have these labels.
Tip: add ccloud/support-group
and ccloud/service
labels to your PrometheusRule
objects. These values will be used as defaults in case your alert rule definitions are
missing these labels or if templating is used. This will ensure that the alert
notifications for absence alerts will be routed to the correct channels.