-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional querier response streaming #7173
Conversation
eca9c2c
to
f1c6c9a
Compare
233e77d
to
23394e3
Compare
can you add a few sentences to the description with some more details for how this works? it will make the review easier. Or if you have a design doc, that'd suffice too |
Thanks for the suggestion and excuse the previously barren PR description @dimitarvdimitrov, I have added some context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a high-level look and the implementation looks good. It fits fairly naturally in the current model. I will take a deeper look later on too.
Perhaps if we want to take this one step further I would have done the preliminary work wrt adding a streaming proto file in httpgrpc instead of only in frontend.proto. I am less hopeful that we will actually do proper streaming httpgrpc because it will require changing a lot of code, but decided to bring it up.
Would be nice to sync with @aknuds1 on his plans for streaming to make sure we don't implement the same thing in two different ways.
65295cc
to
67d3110
Compare
From internal discussions I gathered that the general sentiment in the team is that streaming through the frontend is not very useful for most endpoints (e.g. because of caching done by frontends). Given that, I decided it would be better to keep the change as local as possible for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very solid work! 💪
My main comments are around making the tests a bit more elaborate, most of the others are nitpicks, otherwise I couldn't find anything surprising
5d0c5fe
to
505cfd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final set of comments and then we can merge :)
998080d
to
a7edf35
Compare
What this PR does
Some Mimir API endpoints can generate responses with a very large response body. Today, responses are transmitted from queriers to query-frontends via the non-streaming gRPC call
In case of the
/series
and/active_series
APIs, the (partial) responses generated by queriers can get so large that they cause the query-frontend handling the request to OOM on unmarshaling theQueryResultRequest
containing the response. The following example profile from an internal cell demonstrates this behaviour:While queriers need to accumulate the entire response in memory to be able to deduplicate results from ingesters, query-frontends do not need to bring the full response into memory in all cases. The "active series" endpoint in particular never needs to do this and was implemented in a streaming fashion to avoid the issue of OOMs for large result sets. However, since all response bodies for all types of queries are fully loaded into query-frontend memory in today's implementation of the communication between queriers and query-frontends, this streaming is essentially useless.
To avoid this issue and enable a single query-frontend instance to process arbitrarily large responses, this PR proposes the introduction of a new (streaming) rpc
that queriers can invoke instead of the
QueryResult
call to transmit responses to query-frontends in a streaming fashion. This essentially requires two changes:frontendRequest
data-structure can no longer be canceled/cleaned up on return fromRoundTripGRPC
. Instead, the required cleanup is captured in a closure and passed to the caller, who must invoke it through closing the response body once the transfer is complete. Because of that, we risk introducing a memory leak that could be triggered by client code not closing the response body (today not closing the response body would not lead to a leak, because the body is anio.NopCloser
)./active_series
endpoint request the streaming behaviour, all other responses will continue to be transmitted via the existing rpc even when the feature flag is enabled.One less-than-obvious caveat of this streaming behaviour is that queriers participate in the handling of the request for longer than they previously would have (namely until the response body is fully streamed into the query-frontend, which itself may block on streaming to the client). Since request concurrency in queriers is limited by the
-querier.max-concurrent
flag, this may lead to slow clients and/or big requests blocking querier workers. The following pair of example traces with and without streaming illustrates this difference.Without response streaming
With response streaming
If this approach proves viable, other request types might also benefit from streaming in the future (e.g. remote read).
Which issue(s) this PR fixes or relates to
n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.