Skip to content

Commit

Permalink
Add karpenter alerts (#1449)
Browse files Browse the repository at this point in the history
* Add karpenter alerts

* Add cluster_id label
  • Loading branch information
fiunchinho authored Dec 9, 2024
1 parent b666430 commit 1a163b2
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 0 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Add alerts for `karpenter` issues.

## [4.29.0] - 2024-12-09

### Changed
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
labels: {{- include "labels.common" . | nindent 4}}
name: karpenter.rules
namespace: {{.Values.namespace}}
name: karpenter
spec:
groups:
- name: karpenter
rules:
- alert: KarpenterCanNotRegisterNewNodes
annotations:
description: |
Karpenter provisioner {{`{{ $labels.provisioner }}`}} on cluster {{`{{ $labels.cluster_id }}`}} launched new nodes, but some of nodes did not registered in the cluster
opsrecipe: karpenter/
expr: sum by (provisioner, cluster_id, installation, pipeline, provider) (karpenter_machines_launched) - sum by (provisioner, cluster_id, installation, pipeline, provider)(karpenter_machines_registered) != 0
for: 1h
labels:
area: kaas
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
topic: karpenter
- alert: KarpenterProvisionerAlmostFull
annotations:
description: |
Provisioner {{`{{ $labels.provisioner }}`}} on cluster {{`{{ $labels.cluster_id }}`}} is almost full.
opsrecipe: karpenter/
expr: karpenter_provisioner_usage_pct > 90
for: 72h
labels:
area: kaas
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
topic: karpenter
- alert: KarpenterCloudproviderErrors
annotations:
description: |
Karpenter on cluster {{`{{ $labels.cluster_id }}`}} is getting errors during API calls to the cloud provider.
opsrecipe: karpenter/
expr: rate(karpenter_cloudprovider_errors_total{}[5m]) > 0.1
for: 10m
labels:
area: kaas
cancel_if_monitoring_agent_down: "true"
cancel_if_outside_working_hours: "true"
severity: page
team: {{ include "providerTeam" . }}
topic: karpenter

0 comments on commit 1a163b2

Please sign in to comment.