Merge pull request #77 from simonpasquier/update-alerting

Clarify platform alerting
rhobs · Nov 14, 2024 · 1fd37ff · 1fd37ff
2 parents 4ad7294 + 23a4704
commit 1fd37ff
Show file tree

Hide file tree

Showing 2 changed files with 31 additions and 4 deletions.
diff --git a/content/Products/OpenshiftMonitoring/alerting.md b/content/Products/OpenshiftMonitoring/alerting.md
@@ -4,10 +4,41 @@
 
 This document is intended for OpenShift developers that want to write alerting rules for their operators and operands.
 
+## Configuring alerting rules
+
+You configure alerting rules based on the metrics being collected for your component(s). To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator (provided that the namespace has the `openshift.io/cluster-monitoring="true"` label for layered operators).
+
+Here is an example of a PrometheusRule object with a single alerting rule:
+
+```yaml
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: cluster-example-operator-rules
+  namespace: openshift-example-operator
+spec:
+  groups:
+  - name: operator
+    rules:
+    - alert: ClusterExampleOperatorUnhealthy
+      annotations:
+        description: Cluster Example operator running in pod {{$labels.namespace}}/{{$labels.pods}} is not healthy.
+        summary: Operator Example not healthy
+      expr: |
+        max by(pod, namespace) (last_over_time(example_operator_healthy[5m])) == 0
+      for: 15m
+      labels:
+        severity: warning
+```
+
+You can choose to configure all your alerting rules into a single `PrometheusRule` object or split them into different objects (one per component). The mechanism to deploy the object(s) depends on the context: it can be deployed by the Cluster Version Operator (CVO), the Operator Lifecycle Manager (OLM) or your own operator.
+
 ## Guidelines
 
 Please refer to the [Alerting Consistency](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md) OpenShift enhancement proposal for the recommendations applying to OCP built-in alerting rules.
 
+If you need a review of alerting rules from the OCP monitoring team, you can reach them on the `#forum-openshift-monitoring` channel.
+
 ## Identifying alerting rules without a namespace label
 
 The enhancement proposal mentioned above states the following for OCP built-in alerts:

diff --git a/content/Products/OpenshiftMonitoring/collecting_metrics.md b/content/Products/OpenshiftMonitoring/collecting_metrics.md
@@ -248,10 +248,6 @@ spec:
       app.kubernetes.io/name: my-app
 ```
 
-## Configuring Prometheus rules
-
-In a similar way, you can configure the Prometheus pods with recording and alerting rules based on the metrics being collected. To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator.
-
 ## Next steps
 
 * [Configure alerting](alerting.md) with Prometheus.