diff --git a/content/Products/OpenshiftMonitoring/alerting.md b/content/Products/OpenshiftMonitoring/alerting.md index 9f12973..da09fe6 100644 --- a/content/Products/OpenshiftMonitoring/alerting.md +++ b/content/Products/OpenshiftMonitoring/alerting.md @@ -4,10 +4,41 @@ This document is intended for OpenShift developers that want to write alerting rules for their operators and operands. +## Configuring alerting rules + +You configure alerting rules based on the metrics being collected for your component(s). To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator (provided that the namespace has the `openshift.io/cluster-monitoring="true"` label for layered operators). + +Here is an example of a PrometheusRule object with a single alerting rule: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PrometheusRule +metadata: + name: cluster-example-operator-rules + namespace: openshift-example-operator +spec: + groups: + - name: operator + rules: + - alert: ClusterExampleOperatorUnhealthy + annotations: + description: Cluster Example operator running in pod {{$labels.namespace}}/{{$labels.pods}} is not healthy. + summary: Operator Example not healthy + expr: | + max by(pod, namespace) (last_over_time(example_operator_healthy[5m])) == 0 + for: 15m + labels: + severity: warning +``` + +You can choose to configure all your alerting rules into a single `PrometheusRule` object or split them into different objects (one per component). The mechanism to deploy the object(s) depends on the context: it can be deployed by the Cluster Version Operator (CVO), the Operator Lifecycle Manager (OLM) or your own operator. + ## Guidelines Please refer to the [Alerting Consistency](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md) OpenShift enhancement proposal for the recommendations applying to OCP built-in alerting rules. +If you need a review of alerting rules from the OCP monitoring team, you can reach them on the `#forum-openshift-monitoring` channel. + ## Identifying alerting rules without a namespace label The enhancement proposal mentioned above states the following for OCP built-in alerts: diff --git a/content/Products/OpenshiftMonitoring/collecting_metrics.md b/content/Products/OpenshiftMonitoring/collecting_metrics.md index 8c8f1d7..2cef710 100644 --- a/content/Products/OpenshiftMonitoring/collecting_metrics.md +++ b/content/Products/OpenshiftMonitoring/collecting_metrics.md @@ -248,10 +248,6 @@ spec: app.kubernetes.io/name: my-app ``` -## Configuring Prometheus rules - -In a similar way, you can configure the Prometheus pods with recording and alerting rules based on the metrics being collected. To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator. - ## Next steps * [Configure alerting](alerting.md) with Prometheus.