Skip to content

Commit

Permalink
Clarify platform alerting
Browse files Browse the repository at this point in the history
Signed-off-by: Simon Pasquier <[email protected]>
  • Loading branch information
simonpasquier committed Oct 24, 2024
1 parent 4ad7294 commit 23a4704
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 4 deletions.
31 changes: 31 additions & 0 deletions content/Products/OpenshiftMonitoring/alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,41 @@

This document is intended for OpenShift developers that want to write alerting rules for their operators and operands.

## Configuring alerting rules

You configure alerting rules based on the metrics being collected for your component(s). To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator (provided that the namespace has the `openshift.io/cluster-monitoring="true"` label for layered operators).

Here is an example of a PrometheusRule object with a single alerting rule:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cluster-example-operator-rules
namespace: openshift-example-operator
spec:
groups:
- name: operator
rules:
- alert: ClusterExampleOperatorUnhealthy
annotations:
description: Cluster Example operator running in pod {{$labels.namespace}}/{{$labels.pods}} is not healthy.
summary: Operator Example not healthy
expr: |
max by(pod, namespace) (last_over_time(example_operator_healthy[5m])) == 0
for: 15m
labels:
severity: warning
```
You can choose to configure all your alerting rules into a single `PrometheusRule` object or split them into different objects (one per component). The mechanism to deploy the object(s) depends on the context: it can be deployed by the Cluster Version Operator (CVO), the Operator Lifecycle Manager (OLM) or your own operator.

## Guidelines

Please refer to the [Alerting Consistency](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md) OpenShift enhancement proposal for the recommendations applying to OCP built-in alerting rules.

If you need a review of alerting rules from the OCP monitoring team, you can reach them on the `#forum-openshift-monitoring` channel.

## Identifying alerting rules without a namespace label

The enhancement proposal mentioned above states the following for OCP built-in alerts:
Expand Down
4 changes: 0 additions & 4 deletions content/Products/OpenshiftMonitoring/collecting_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,10 +248,6 @@ spec:
app.kubernetes.io/name: my-app
```
## Configuring Prometheus rules
In a similar way, you can configure the Prometheus pods with recording and alerting rules based on the metrics being collected. To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator.

## Next steps
* [Configure alerting](alerting.md) with Prometheus.
Expand Down

0 comments on commit 23a4704

Please sign in to comment.