-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add max-inflight-requests limit to store gateway #5553
Add max-inflight-requests limit to store gateway #5553
Conversation
Signed-off-by: Justin Jung <[email protected]>
b7f7b10
to
e88b17e
Compare
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
46978a9
to
5b4cfdc
Compare
Signed-off-by: Justin Jung <[email protected]>
086dfb6
to
851d1e9
Compare
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
851d1e9
to
c6b1264
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you are doing.
Isn't this more like an instance limit?
It would be nice to have metrics like it was done in #4071
We already have metrics for current usage in https://github.com/cortexproject/cortex/blob/master/vendor/github.com/weaveworks/common/server/server.go#L316 |
right that is one, the other one was instanceLimitsMetric which shows the current limit. If it is indeed an instance limit. I mean, I am just helping 😄 , not blocking. The requests for changes is for the tiny nit on the flag. |
@friedrichg Thanks for the review and help. I was just saying we already have the usage metric, agree we can add the limit metrics if necessary. All good, not blocking at all. |
…ight before making the series call Signed-off-by: Justin Jung <[email protected]>
I liked the idea of grouping all instance limits, so on top of this new limit also created metrics for existing limits:
|
Signed-off-by: Justin Jung <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@justinjung04 test failure seems related to the new metrics added. Once we fix that we can merge |
Signed-off-by: Justin Jung <[email protected]>
What this PR does:
Store gateway can only handle set number of requests at a given time, and rest of the requests are just queued up, waiting in the queryGate (channel). Since queriers select store gateways randomly without any information about their load, lots of requests could flood to one store gateway instance and slows down overall query latency (or could even lead to timeouts).
This PR add a new configuration
-blocks-storage.bucket-store.max-inflight-request
that allows requests to be rejected fast, which then queriers could try other store gateway replicas without waiting.Which issue(s) this PR fixes:
n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]