Skip to content

Commit

Permalink
merge with main
Browse files Browse the repository at this point in the history
  • Loading branch information
QuantumEnigmaa committed Sep 19, 2023
2 parents d7f5331 + 737a5e9 commit b67266c
Show file tree
Hide file tree
Showing 6 changed files with 12 additions and 2 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed

- Split `KubeStateMetricsDown` alert into 2 alerts : `KubeStateMetricsDown` and `KubeStateMetricsNotRetrievingMetrics`
- Add missing prometheus-agent inhibition to `KubeStateMetricsDown` alert
- Change time duration before `ManagementClusterDeploymentMissingAWS` pages because it is dependant on the `PrometheusAgentFailing` alert.

## [2.132.0] - 2023-09-15

Expand Down Expand Up @@ -172,7 +174,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [2.115.0] - 2023-07-20


### Added

- New alert `KubeStateMetricsSlow` that inhibits KSM related alerts.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ spec:
description: '{{`Deployment {{ $labels.deployment }} is missing.`}}'
opsrecipe: management-cluster-deployment-is-missing/
expr: absent(kube_deployment_status_condition{namespace="giantswarm", condition="Available", deployment="aws-admission-controller"})
for: 5m
for: 15m
labels:
area: kaas
cancel_if_prometheus_agent_down: "true"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ spec:
cancel_if_cluster_status_deleting: "true"
cancel_if_cluster_has_no_workers: "true"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
severity: page
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ spec:
cancel_if_kubelet_down: "true"
cancel_if_cluster_has_no_workers: "true"
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
cancel_if_prometheus_agent_down: "true"
severity: notify
team: honeybadger
topic: releng
Expand All @@ -41,6 +42,7 @@ spec:
cancel_if_kubelet_down: "true"
cancel_if_cluster_has_no_workers: "true"
cancel_if_outside_working_hours: "true"
cancel_if_prometheus_agent_down: "true"
severity: page
team: atlas
topic: observability
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ spec:
labels:
area: kaas
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
cancel_if_prometheus_agent_down: "true"
severity: page
team: {{ include "providerTeam" . }}
topic: vault
Expand Down
5 changes: 5 additions & 0 deletions test/tests/providers/global/up.all.rules.test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand All @@ -79,6 +80,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand Down Expand Up @@ -107,6 +109,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand Down Expand Up @@ -160,6 +163,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand All @@ -182,6 +186,7 @@ tests:
cancel_if_kubelet_down: "true"
cancel_if_outside_working_hours: "false"
inhibit_kube_state_metrics_down: "true"
cancel_if_prometheus_agent_down: "true"
severity: "page"
team: "atlas"
topic: "observability"
Expand Down

0 comments on commit b67266c

Please sign in to comment.