Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guard against missing samples in KEDA queries #7691

Merged
merged 2 commits into from
Mar 22, 2024

Conversation

jhalterman
Copy link
Member

@jhalterman jhalterman commented Mar 21, 2024

What this PR does

This PR adds clauses to some of our KEDA queries to guard against scenarios that could cause unintended scaledowns. This happens when the prometheus targeted by KEDA is down or unresponsive for some time, which could cause data that is used in KEDA queries to be missed. Without this data, when prometheus comes back up, these queries return lower averages for a while, which can cause unintended scaledowns.

An example of what this looks like from kubectl before, during, and after prometheus going down:

NAME                   REFERENCE                TARGETS                                             MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-distributor   Deployment/distributor   1046561m/1k (avg), 1633790418097m/2147483648 (avg)   10        40        29         171d
keda-hpa-distributor   Deployment/distributor   <unknown>/1k (avg), 0/2147483648 (avg)               10        40        29         171d
keda-hpa-distributor   Deployment/distributor   61190m/1k (avg), 488859859863m/2147483648 (avg)      10        40        29         171d
keda-hpa-distributor   Deployment/distributor   81245m/1k (avg), 488973756911m/2147483648 (avg)      10        40        26         171d
keda-hpa-distributor   Deployment/distributor   81245m/1k (avg), 488973756911m/2147483648 (avg)      10        40        26         171d
keda-hpa-distributor   Deployment/distributor   274241m/1k (avg), 1511713092856m/2147483648 (avg)   10        40        26         171d
keda-hpa-distributor   Deployment/distributor   275528m/1k (avg), 1511747993600m/2147483648 (avg)   10        40        26         171d
keda-hpa-distributor   Deployment/distributor   307320m/1k (avg), 1686180454400m/2147483648 (avg)   10        40        26         171d
keda-hpa-distributor   Deployment/distributor   528739m/1k (avg), 1686180454400m/2147483648 (avg)   10        40        26         171d
keda-hpa-distributor   Deployment/distributor   528688m/1k (avg), 1686180454400m/2147483648 (avg)   10        40        23         171d

With this change, bringing prometheus down for some time and then back up does not result in an unintended scaledown:

NAME                   REFERENCE                TARGETS                                          MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-distributor   Deployment/distributor   965560m/1k (avg), 1626915225600m/2147483648 (avg)    10        40        32         171d
keda-hpa-distributor   Deployment/distributor   <unknown>/1k (avg), <unknown>/2147483648 (avg)   10        40        32         171d
keda-hpa-distributor   Deployment/distributor   948552m/1k (avg), 1631584217600m/2147483648 (avg)   10        40        32         171d
keda-hpa-distributor   Deployment/distributor   948046m/1k (avg), 1632766489600m/2147483648 (avg)   10        40        32         171d
keda-hpa-distributor   Deployment/distributor   947425m/1k (avg), 1632022976/2147483648 (avg)       10        40        32         171d

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@jhalterman jhalterman requested a review from a team as a code owner March 21, 2024 23:27
@jhalterman jhalterman force-pushed the safe-keda-queries branch 2 times, most recently from 171227b to bdefe9c Compare March 21, 2024 23:48
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, otherwise lgtm

operations/mimir/autoscaling.libsonnet Show resolved Hide resolved
@jhalterman jhalterman merged commit dea203b into grafana:main Mar 22, 2024
29 checks passed
@jhalterman jhalterman deleted the safe-keda-queries branch March 22, 2024 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants