-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query priority #5605
Query priority #5605
Conversation
9a9275c
to
5a57717
Compare
fc1260d
to
c75a0ca
Compare
Since this is a big change, I'm running test in my dev env as well. Once I have the result I'll share it here. |
Can you please add an E2E test case for this feature? It would be good to show that the reserved querier only processes high priority query |
I know we are using a I would also have try to have some kind of heuristic to set this priority? setting this manually seems not scalable. |
fc9351a
to
20692fa
Compare
280dbab
to
6044ec7
Compare
09610c2
to
17bdef6
Compare
Major upgrade completed, @yeya24 and @alanprot could you take a look? Meanwhile I'll run beta test to check the followings:
|
5d0746f
to
ededdc2
Compare
E2E testing completed and the result looks awesome. Added to the PR description. |
queue queue | ||
lengthGauge prometheus.Gauge | ||
} | ||
|
||
// Op is an operation on the priority queue. | ||
type Op interface { | ||
Key() string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generalizing PriorityQueue by removing the Key and duplicate key check. Confirmed that this code is not used elsewhere in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Overall I think the pr is in the right direction and in good shape. I have added some comments.
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
…auge vector in user request queue Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
621ca87
to
aa38a8c
Compare
Signed-off-by: Justin Jung <[email protected]>
We should also consider this scenario: a large number of high-priority requests could potentially cause lower-priority ones to time out. |
Yes, I wanted to make this PR as a preliminary feature that assumes small list of priorities and querier scale up if too many queries are timing out. I agree that this feature should be improved by introducing fairness logic between different priorities, but wanted to keep the initial PR simple as the code change involved with introducing priority itself seemed big enough. |
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Awesome work
Performed another end-to-end test, and things are still working as expected. |
What this PR does:
Currently, each tenant has single FIFO request queue in the query path, where queriers assigned for the tenant will pick up next request from the queue.
Having a single FIFO queue results in situations where
This PR suggests to introduce new "query priority" for each tenant that does the following:
Here's an example configuration:
Which issue(s) this PR fixes:
n/a
End-to-end test result:
There were 4 things to verify from the end-to-end test:
In short, all of them were verified successfully with some cool remarks:
Image 1: Metrics with new labels showing smooth transition between FIFO and priority queue.
Image 2: Change in QPS per status code and latency.
Image 3: Change in CPU and memory usage.
Image 4: One reserved querier waiting for priority 3 query that was never enqueued.
Test setup:
Queries ran continuously:
Query priority config used:
Timeline:
14:00 | Applied new image that contains this PR
14:50 | Enabled query priority config
15:45 | Disabled query priority config without removing the priority definitions
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]