Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support fallback configuration for KEDA autoscaling #9846

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hobbsh
Copy link

@hobbsh hobbsh commented Nov 6, 2024

What this PR does

Supports a fallback configuration for the KEDA autoscaling configuration in the mimir-distributed helm chart, so if/when the metrics endpoint being used to scale becomes unavailable, the ScaledObject will fallback to the configured replica count.

Which issue(s) this PR fixes or relates to

n/a

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@hobbsh hobbsh requested a review from a team as a code owner November 6, 2024 22:53
@CLAassistant
Copy link

CLAassistant commented Nov 6, 2024

CLA assistant check
All committers have signed the CLA.

@jhesketh
Copy link
Contributor

Thank you for the contribution!

This will need an entry in operations/helm/charts/mimir-distributed/CHANGELOG.md.

I'm also curious on your opinion if a fallback if preferable to just maintaining the current number of replicas given that a deployment is also protected by the minReplicas? Would it not be better to maintain the current replicas and alert when KEDA isn't able to get the metrics? Or is the intention to generally have a fallback end up scaling up a deployment?

@hobbsh
Copy link
Author

hobbsh commented Nov 11, 2024

First off, thank you for the engagement!

Would it not be better to maintain the current replicas and alert when KEDA isn't able to get the metrics?

In my experience, if the distributors aren't scaling, they will quickly OOM and then we get into a catch 22 where we need to scale but can't because metrics can't be retrieved. KEDA created fallback for exactly this reason so I would much prefer to use it rather than try to rely on minReplicas, because for one it would lead to additional overhead/cost to do it that way. We use this for some internal services and it works well, so it would be great to have this option for Mimir. So yes, the intention is to have fallback scale up the deployment if we have a metrics blip. Obviously, it would be great to have the autoscaling metrics coming from an external source, but we are quite resource strapped and that's not the easiest option.

This will need an entry in operations/helm/charts/mimir-distributed/CHANGELOG.md

Done, thanks!

@chencs
Copy link
Contributor

chencs commented Dec 20, 2024

The CHANGELOG has just been cut to prepare for the next release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the operations/helm/charts/mimir-distributed/CHANGELOG.md document. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants