fetching: export utilities for decompressing and parsing partition fetch responses #803

dimitarvdimitrov · 2024-08-08T16:00:42Z

Background

In grafana/mimir we are working towards making fetch requests ourselves. The primary reason behind that is that individual requests to the kafka backend are slow, so doing them sequentially per partition becomes the bottleneck in our application. So we want to fetch records in parallel to speed up the consumption.

One difficulty I met when issuing FetchRequests ourselves is that parsing the response is non-trivial. That's why I'm proposing to export these functions for downstream projects to use.

Alternatively, I can also try contributing the concurrent fetching logic. But I believe that is much more nuanced and with more tradeoffs around fetched bytes and latency. So I wasn't sure whether it's a good fit for a general purpose library. I'm open to discuss this further.

What this PR does

Moves (*kgo.cursorOffsetNext).processRespPartition from being a method to being a standalone function - kgo.processRespPartition. There were also little changes necessary to make the interface suitable for public use (like removing the *broker parameter).

Side effects

To minimize the necessary changes and the API surface of the package I opted to use a single global decompressor for all messages. Previously, there would be one decompressor per client and that decompressor would be passed down to (*cursorOffsetNext).processRespPartition. My understanding is that using different pooled readers (lz4, zst, gzip) shouldn't have a negative impact on performance because usage patterns do not affect the behaviour of the reader (for example, a consistent size of decompressed data doesn't make the reader more or less efficient). I have not thoroughly verified or tested this - Let me know if you think that's important.

An alternative to this is to also export the decompressor along with newDecompressor() and the auxiliary types for decompression.

Note to reviewers

I haven't added explicit tests for this because it's not new code and consumer_direct_test.go already tests it. Happy to add tests if you think they're necessary now that this is exported.

…tch responses ### Background In grafana/mimir we are working towards making fetch requests ourselves. The primary reason behind that is that individual requests to the kafka backend are slow, so doing them sequentially per partition becomes the bottleneck in our application. So we want to fetch records in parallel to speed up the consumption. One difficulty I met when issuing `FetchRequest`s ourselves is that parsing the response is non-trivial. That's why I'm proposing to export these functions for downstream projects to use. Alternatively, I can also try contributing the concurrent fetching logic. But I believe that is much more nuanced and with more tradeoffs around fetched bytes and latency. So I wasn't sure whether it's a good fit for a general purpose library. I'm open to discuss this further. ### What this PR does Moves `(*kgo.cursorOffsetNext).processRespPartition` from being a method to being a standalone function - `kgo.processRespPartition`. There were also little changes necessary to make the interface suitable for public use (like removing the `*broker` parameter). ### Side effects To minimize the necessary changes and the API surface of the package I opted to use a single global decompressor for all messages. Previously, there would be one decompressor per client and that decompressor would be passed down to `(*cursorOffsetNext).processRespPartition`. My understanding is that using different pooled readers (lz4, zst, gzip) shouldn't have a negative impact on performance because usage patterns do not affect the behaviour of the reader (for example, a consistent size of decompressed data doesn't make the reader more or less efficient). I have not thoroughly verified or tested this - Let me know if you think that's important. An alternative to this is to also export the `decompressor` along with `newDecompressor()` and the auxiliary types for decompression.

twmb · 2024-08-27T05:41:16Z

I'm open to this. I'm probably going to look more closely at this PR within the next 3w (I'm only choosing empty weekends at the moment for feature work). I'm not entirely convinced on the options struct, I'll see if I can come up with a different proposal.

dimitarvdimitrov · 2024-08-29T07:35:52Z

Thanks for taking a look.

I also tried to think about alternatives to the options struct: we can create a FetchParser (name TBD ofc) struct that's constructed with a constructor which takes the same parameters as the options fields. Then ProcessRespPartition is a method on that struct instead of just a standalone function.

gaffneyc · 2024-11-13T03:10:56Z

We ran into this problem as well today looking to convert a custom Fetch from Sarama to Franz which pulled a single record given topic, partition, and offset. I haven't looked at the PR itself as I haven't looked at it but would love to see a way in the API to fully parse the responses into records.

dimitarvdimitrov added 2 commits August 8, 2024 17:28

Restore multiline processV0OuterMessage

396d6f9

dimitarvdimitrov mentioned this pull request Sep 4, 2024

kafka replay speed: use franz-go fork grafana/mimir#9203

Merged

4 tasks

This was referenced Sep 27, 2024

kafka replay speed: upstream concurrent fetchers grafana/mimir#9452

Merged

fetching: export utilities for decompressing and parsing partition fetch responses grafana/franz-go#4

Merged

twmb added the waiting label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetching: export utilities for decompressing and parsing partition fetch responses #803

fetching: export utilities for decompressing and parsing partition fetch responses #803

dimitarvdimitrov commented Aug 8, 2024

twmb commented Aug 27, 2024

dimitarvdimitrov commented Aug 29, 2024

gaffneyc commented Nov 13, 2024

fetching: export utilities for decompressing and parsing partition fetch responses #803

Are you sure you want to change the base?

fetching: export utilities for decompressing and parsing partition fetch responses #803

Conversation

dimitarvdimitrov commented Aug 8, 2024

Background

What this PR does

Side effects

Note to reviewers

twmb commented Aug 27, 2024

dimitarvdimitrov commented Aug 29, 2024

gaffneyc commented Nov 13, 2024