Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MimirHPANeedsToBeScaledUp alert #1340

Merged
merged 7 commits into from
Sep 8, 2024
Merged

add MimirHPANeedsToBeScaledUp alert #1340

merged 7 commits into from
Sep 8, 2024

Conversation

TheoBrigitte
Copy link
Member

@TheoBrigitte TheoBrigitte commented Sep 2, 2024

Towards: https://github.com/giantswarm/giantswarm/issues/31385

This PR adds MimirHPANeedsToBeScaledUp alert in order to detect when an HorizontalPodAutoscaler in the mimir namespace has reached its maximum capacity; meaning replicas are maxed out and resources usage is above targets.

For this alert I used the kube_horizontalpodautoscaler_* metrics provided by Kubernetes:

  • first part of the query detects when the current desired replicas has reached the maximum allowed replicas
  • second part of the query detects when the current metrics value used for autscaling (cpu or memory) is above target. When both conditions are met the alert fires.

@TheoBrigitte TheoBrigitte requested a review from a team as a code owner September 2, 2024 17:02
@TheoBrigitte TheoBrigitte self-assigned this Sep 2, 2024
CHANGELOG.md Outdated Show resolved Hide resolved
@TheoBrigitte
Copy link
Member Author

@QuantumEnigmaa should we change this to make it work for Loki also ?

@QuantumEnigmaa
Copy link
Contributor

I think we can do that later. Let's merge this one and the loki one first so that we have those enabled asap and then let's create an issue to refactor this so that we can discuss how we want to do it

@TheoBrigitte TheoBrigitte enabled auto-merge (squash) September 3, 2024 15:10
Copy link
Contributor

@QuantumEnigmaa QuantumEnigmaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you add UTs ?

Copy link
Contributor

@QuentinBisson QuentinBisson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from missing tests :)

@TheoBrigitte TheoBrigitte merged commit 0666a68 into main Sep 8, 2024
6 of 7 checks passed
@TheoBrigitte TheoBrigitte deleted the hpa-scaling-alert branch September 8, 2024 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants