From 3d4a3f4011d2dd46b11c701ea75fc66b736de90f Mon Sep 17 00:00:00 2001 From: Quentin Bisson Date: Tue, 14 May 2024 11:06:35 +0200 Subject: [PATCH] fix: cert-manager related alerts for mimir (#1161) * fix: cert-manager related alerts for mimir * Update CHANGELOG.md --- CHANGELOG.md | 1 + .../templates/alerting-rules/cert-manager.rules.yml | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0c1f44153..2bd37533f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - Remove cilium entry from KAAS SLOs. +- Fix cert-manager rules for mimir. - Fix operatorkit related alerts for mimir. - Fix Loki/Mimir and Tempo mixins according to `pint` recommendations - Fix cilium related alerts for mimir. diff --git a/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml b/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml index da8c67b95..f30cb9147 100644 --- a/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml +++ b/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml @@ -17,7 +17,7 @@ spec: If memory usage value is equal to memory limit value then it is likely the pod will be evicted. If no limits are set then the pod will burst. `}} - expr: (sum by (cluster_id, pod, namespace, container) (container_memory_working_set_bytes{container=~"(cert-manager|cert-manager-app-controller)"}) / 1024 / 1024 / 1024) >= 0.85 + expr: (sum by (cluster_id, installation, pipeline, provider, pod, namespace, container) (container_memory_working_set_bytes{container=~"(cert-manager|cert-manager-app-controller)"}) / 1024 / 1024 / 1024) >= 0.85 for: 10m labels: area: kaas @@ -44,7 +44,7 @@ spec: annotations: description: '{{`There are too many CertificateRequests in cluster {{ $labels.cluster_id }}.`}}' opsrecipe: cert-requests-too-many/ - expr: sum by (cluster_id) (etcd_kubernetes_resources_count{kind="certificaterequests.cert-manager.io"}) > 10000 + expr: sum by (cluster_id, installation, pipeline, provider) (etcd_kubernetes_resources_count{kind="certificaterequests.cert-manager.io"}) > 10000 for: 15m labels: area: kaas