Skip to content

Commit

Permalink
Update Linkerd alerts (#1132)
Browse files Browse the repository at this point in the history
* Cancel linkerd alert outside business hours

Signed-off-by: Matias Charriere <[email protected]>

* Remove Linkerd deployments from SLO

Signed-off-by: Matias Charriere <[email protected]>

* Cover all linkerd namespaces with Linkerd alert

Signed-off-by: Matias Charriere <[email protected]>

* update changelog

Signed-off-by: Matias Charriere <[email protected]>

* fix test

Signed-off-by: Matias Charriere <[email protected]>

---------

Signed-off-by: Matias Charriere <[email protected]>
  • Loading branch information
mcharriere authored Apr 19, 2024
1 parent a6f165e commit 079013a
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 7 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed

- Update ops-recipe link for promtail alerts.
- Remove Linkerd form Service SLO alerts.
- Include all Linkerd Namespaces in LinkerdDeploymentNotSatisfied alert.
- Make LinkerdDeploymentNotSatisfied alert business hours only.

## [3.11.2] - 2024-04-18

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ spec:
annotations:
description: '{{`Linkerd Deployment {{ $labels.namespace}}/{{ $labels.deployment }} is not satisfied.`}}'
opsrecipe: managed-app-linkerd/
expr: managed_app_deployment_status_replicas_unavailable{deployment=~"linkerd.*"} > 0
expr: managed_app_deployment_status_replicas_unavailable{namespace=~"linkerd.*"} > 0
for: 30m
labels:
area: managedservices
cancel_if_outside_working_hours: {{ include "workingHoursOnly" . }}
cancel_if_outside_working_hours: true
severity: page
team: cabbage
topic: linkerd
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,21 @@ spec:
expr: |
label_replace(
(
slo_errors_per_request:ratio_rate1h{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*"}
slo_errors_per_request:ratio_rate1h{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*", namespace!~"linkerd.*"}
> on (cluster_id, service) group_left ()
slo_threshold_high
and
slo_errors_per_request:ratio_rate5m{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*"}
slo_errors_per_request:ratio_rate5m{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*", namespace!~"linkerd.*"}
> on (cluster_id, service) group_left ()
slo_threshold_high
)
or
(
slo_errors_per_request:ratio_rate6h{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*"}
slo_errors_per_request:ratio_rate6h{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*", namespace!~"linkerd.*"}
> on (cluster_id, service) group_left ()
slo_threshold_low
and
slo_errors_per_request:ratio_rate30m{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*"}
slo_errors_per_request:ratio_rate30m{service!~"efk-.*|.*external-dns.*|kong-.*|.*(ingress-nginx|nginx-ingress-controller).*", namespace!~"linkerd.*"}
> on (cluster_id, service) group_left ()
slo_threshold_low
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ tests:
- exp_labels:
alertname: LinkerdDeploymentNotSatisfied
area: managedservices
cancel_if_outside_working_hours: "false"
cancel_if_outside_working_hours: "true"
namespace: linkerd
deployment: linkerd-destination
managed_app: destination
Expand Down

0 comments on commit 079013a

Please sign in to comment.