From d7279a792a36e9b0dfa23b86c024e86661c1590a Mon Sep 17 00:00:00 2001 From: QuentinBisson Date: Fri, 3 May 2024 15:50:12 +0200 Subject: [PATCH 1/2] fix: cert-manager related alerts for mimir --- CHANGELOG.md | 1 + .../templates/alerting-rules/cert-manager.rules.yml | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e01ce920d..a34cfe5cc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - Remove cilium entry from KAAS SLOs. +- Fix cert-manager rules for mimir. ## [3.13.1] - 2024-04-30 diff --git a/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml b/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml index da8c67b95..f30cb9147 100644 --- a/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml +++ b/helm/prometheus-rules/templates/alerting-rules/cert-manager.rules.yml @@ -17,7 +17,7 @@ spec: If memory usage value is equal to memory limit value then it is likely the pod will be evicted. If no limits are set then the pod will burst. `}} - expr: (sum by (cluster_id, pod, namespace, container) (container_memory_working_set_bytes{container=~"(cert-manager|cert-manager-app-controller)"}) / 1024 / 1024 / 1024) >= 0.85 + expr: (sum by (cluster_id, installation, pipeline, provider, pod, namespace, container) (container_memory_working_set_bytes{container=~"(cert-manager|cert-manager-app-controller)"}) / 1024 / 1024 / 1024) >= 0.85 for: 10m labels: area: kaas @@ -44,7 +44,7 @@ spec: annotations: description: '{{`There are too many CertificateRequests in cluster {{ $labels.cluster_id }}.`}}' opsrecipe: cert-requests-too-many/ - expr: sum by (cluster_id) (etcd_kubernetes_resources_count{kind="certificaterequests.cert-manager.io"}) > 10000 + expr: sum by (cluster_id, installation, pipeline, provider) (etcd_kubernetes_resources_count{kind="certificaterequests.cert-manager.io"}) > 10000 for: 15m labels: area: kaas From 0b93354a2454eea382b16ef6a2d4b216adff139e Mon Sep 17 00:00:00 2001 From: Quentin Bisson Date: Tue, 14 May 2024 11:02:01 +0200 Subject: [PATCH 2/2] Update CHANGELOG.md --- CHANGELOG.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d6a99f918..2bd37533f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,11 +24,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - Remove cilium entry from KAAS SLOs. - - Fix cert-manager rules for mimir. - - Fix operatorkit related alerts for mimir. - - Fix Loki/Mimir and Tempo mixins according to `pint` recommendations - Fix cilium related alerts for mimir. - Fix etcd alerts for Mimir.