Skip to content

Commit

Permalink
Add missing prometheusagentfailing inhibition (#911)
Browse files Browse the repository at this point in the history
  • Loading branch information
QuentinBisson authored Sep 19, 2023
1 parent 94b8cac commit 737a5e9
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 2 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed

- Add missing prometheus-agent inhibition to `KubeStateMetricsDown` alert
- Change time duration before `ManagementClusterDeploymentMissingAWS` pages because it is dependant on the `PrometheusAgentFailing` alert.

## [2.132.0] - 2023-09-15

### Changed
Expand Down Expand Up @@ -168,7 +173,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [2.115.0] - 2023-07-20


### Added

- New alert `KubeStateMetricsSlow` that inhibits KSM related alerts.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ spec:
description: '{{`Deployment {{ $labels.deployment }} is missing.`}}'
opsrecipe: management-cluster-deployment-is-missing/
expr: absent(kube_deployment_status_condition{namespace="giantswarm", condition="Available", deployment="aws-admission-controller"})
for: 5m
for: 15m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ spec:
cancel_if_kubelet_down: "true"
cancel_if_cluster_has_no_workers: "true"
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
cancel_if_prometheus_agent_down: "true"
severity: notify
team: honeybadger
topic: releng
Expand All @@ -41,6 +42,7 @@ spec:
cancel_if_kubelet_down: "true"
cancel_if_cluster_has_no_workers: "true"
cancel_if_outside_working_hours: "true"
cancel_if_prometheus_agent_down: "true"
severity: page
team: atlas
topic: observability
Expand Down Expand Up @@ -73,6 +75,7 @@ spec:
inhibit_kube_state_metrics_down: "true"
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
cancel_if_prometheus_agent_down: "true"
severity: page
team: atlas
topic: observability
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ spec:
labels:
area: kaas
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
cancel_if_prometheus_agent_down: "true"
severity: page
team: {{ include "providerTeam" . }}
topic: vault
Expand Down
5 changes: 5 additions & 0 deletions test/tests/providers/global/up.all.rules.test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand All @@ -79,6 +80,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand Down Expand Up @@ -107,6 +109,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand Down Expand Up @@ -160,6 +163,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand All @@ -182,6 +186,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand Down

0 comments on commit 737a5e9

Please sign in to comment.