-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about scaling with AlertManager #1271
Comments
I the picture I shared above, we could see curl function will be scaled up every 40 seconds, according to the default alertmanager settings. |
Hi @rrrrover, thanks for your interest in the auto-scaling. I think you've described how the AlertManager option works reasonably well. It's not the only option and this is customisable. If you are not satisfied with the default auto-scaling for your use-case, you can edit it:
HPAv2 would allow you to use either CPU, memory, or custom metrics i.e. QPS (see the metrics gathered from the watchdog / function for this option)
As you identified, scaling down to min replicas corresponds to a resolved alert from AlertManager. I am not sure how much you can expect to edit that experience whilst retaining that semantic. You can edit the AlertManager rules for scaling up, and that's something I've seen other users doing too. I would suggest you try out your sample PromQL and report back on how it compares for your use-case. Looking forward to hearing from you soon, Alex |
-- |
Hi @alexellis , thanks for the reply and the patient guidance. My use case was inspired by HPAv2 rules in k8s. That's why I observe QPS per pod not QPS total in prometheus. I've tried my new PromQL which fires an alert when each pod handles over 5 requests per second
I send 6 requests to the function pod every second, so it will scale up to 5 pods to resolve the alert. And I found that when replica finally reaches a desired number, the alert resolved and pods were scaled down to 1. And then alert fired again. So my propose to scale down by a new prometheus alert is to solve this infinite loop. We could still observe the QPS per pod, but this time we should pick the threshold carefully so after scale down QPS per pod will not trigger scale-up again. In this example above, we could scale down with step of 4 pods (20%*maxReplicas) when QPS per pod is less than 1. So QPS(6) / replicas(5) > 1, no scale down triggered, replicas are stable |
By this, do you mean the My use case is not a real world request, I was just studying openfaas and thought about the auto-scaling. If this is not openfaas main focus right now, I can close this issue. BTW I joined the community days ago, very willing to contribute :D |
Hi @rrrrover, I think you have a valid point and I'd like to see how far you can push AlertManager. It may require a separate Go process similar faas-idler to make sure that the scale-up/down is not orthogonal. What's your name on Slack? |
Hi @alexellis , my name is also rrrrover on slack |
@rrrrover would you also be interested in working on this issue? openfaas/faas-netes#483 |
@alexellis thank you for your trust, I'd like to work on that issue too. |
Hi @alexellis , I've created a project faas-autoscaler to do autoscaling for openfaas. Would you mind to take some time to have a look at it? Currently I use two prometheus rules, one for scale up and one for scale down.
Now faas-autoscaler can scale up/down functions normally. I'll do some math to find proper QPS threshold for scale up/down later |
Hi @alexellis , it's been a while since our last talk. I've updated my faas-autoscaler project.
With this rule set, faas-autoscaler will know the desired metric for each function replica, defined the label
As the rule expr is always true, alert will keeps firing, so faas-autoscaler will act like it's checking function replicas periodically (every 40 seconds) |
How about just simply scale down from current replicas to " currentReplicas - math.Ceil(currentReplicas * scalingFactor)" when resolved event received. Then we need no scale down endpoint. |
Hi @lmxia , thanks for the tips. I've improved faas-autoscaler a little, it uses only one endpoint
I'm still keeping the "old" faas-autoscaler endpoints Let's assume we need to autoscale functions according to the RPS(request-per-second) for each replica, we want RPS in range [50, 100]. When the system receives 1000 function calls per second, the optimal replicas are 10. With the old config set, we will scale up step by step, bring the replicas to 10. And when the system RPS drop to only 100, we should scale down to 1 replica, step by step from receiving If we only scale down when the |
I think this would be a good topic for the next community call, would you be interested in presenting your scaler @rrrrover ? |
Hi @alexellis , thanks for this opportunity. But first I want to know when is the community call? Because I'm in China, we have 9 hours jet lag, I might not have time to join. |
Thank you for your work on this |
So when I invoke a long-running process, and it takes a few seconds to give a response (thereby using Similarly, once a burst functions complete, the function scales up, and then back down (by the looks of it, while the function is running) because not enough have completed in the last 5 seconds. My initial thought is to alter the alert rule to take into account That said, it might simply be more appropriate to calculate a new metric specifically for currently running invocations, and then providing (or calculating) a number of invocations that a particular pod should be able to handle concurrently. As it stands, autoscaling doesn't really appear to work for longer running (on the order of a minute or two per invocation) functions, because it rubberbands the scaling size based on recent completed invocations, not current invocations. I'm currently experimenting with a slightly altered alert rule of something like:
Apologies for the less-than optimal query, not super experienced with promql. I see there's some documentation about it being able to be set via |
Is this issue still active? There hasn't been any activity in a year. |
My actions before raising this issue
Openfaas uses prometheus to monitor function calls, and when function QPS is higher than some threshold autoscale will be triggered.
But after functions are scaled up, the QPS won't go down, so functions will still be scaled up until maxReplicas are reached.
In my opinion, when we scale up functions, the QPS for each function replica will go down, it means the load for each replica will go down.
So when we scale function to X replicas where QPS/X is relatively small, we can stop scale up.
Also when the alert is stop, replicas will be set to minReplicas, QPS per replica will arise and probabily higher than we'd expect
Expected Behaviour
When APIHighInvocationRate alert is fired, function should only scale up to some scale not maxReplicas.
when APIHighInvocationRate is stopped, we should scale down function gracefully just like we scale up, little by little, to finnaly reach a safe QPS per replica
Current Behaviour
When APIHighInvocationRate alert keeps firing (function QPS is high), function replicas will soon reach maxReplicas (default 20)
When APIHighInvocationRate alert stops, function replica will drop to minReplicas (default 1)
Possible Solution
sum by(function_name) (rate(gateway_function_invocation_total{code="200"}[10s]) / ignoring(code) gateway_service_count) > 5
Steps to Reproduce (for bugs)
hey -m POST -q 6 -c 1 -d http://some-test-service:8080/ -z 30m http://192.168.99.100:31112/function/curl
kubectl logs -f deploy/gateway -c gateway -n openfaas| grep Scale
to watch scale up/down logsThe text was updated successfully, but these errors were encountered: