Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify platform alerting #77

Merged
merged 1 commit into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions content/Products/OpenshiftMonitoring/alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,41 @@

This document is intended for OpenShift developers that want to write alerting rules for their operators and operands.

## Configuring alerting rules

You configure alerting rules based on the metrics being collected for your component(s). To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator (provided that the namespace has the `openshift.io/cluster-monitoring="true"` label for layered operators).

Here is an example of a PrometheusRule object with a single alerting rule:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cluster-example-operator-rules
namespace: openshift-example-operator
spec:
groups:
- name: operator
rules:
- alert: ClusterExampleOperatorUnhealthy
annotations:
description: Cluster Example operator running in pod {{$labels.namespace}}/{{$labels.pods}} is not healthy.
summary: Operator Example not healthy
expr: |
max by(pod, namespace) (last_over_time(example_operator_healthy[5m])) == 0
for: 15m
labels:
severity: warning
```

You can choose to configure all your alerting rules into a single `PrometheusRule` object or split them into different objects (one per component). The mechanism to deploy the object(s) depends on the context: it can be deployed by the Cluster Version Operator (CVO), the Operator Lifecycle Manager (OLM) or your own operator.

## Guidelines

Please refer to the [Alerting Consistency](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md) OpenShift enhancement proposal for the recommendations applying to OCP built-in alerting rules.

If you need a review of alerting rules from the OCP monitoring team, you can reach them on the `#forum-openshift-monitoring` channel.

## Identifying alerting rules without a namespace label

The enhancement proposal mentioned above states the following for OCP built-in alerts:
Expand Down
4 changes: 0 additions & 4 deletions content/Products/OpenshiftMonitoring/collecting_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,10 +248,6 @@ spec:
app.kubernetes.io/name: my-app
```

## Configuring Prometheus rules

In a similar way, you can configure the Prometheus pods with recording and alerting rules based on the metrics being collected. To do so, you should create `PrometheusRule` objects in your operator/operand namespace which will also be picked up by the Prometheus operator.

## Next steps

* [Configure alerting](alerting.md) with Prometheus.
Expand Down
Loading