Implement routing for ListOffsets #1767
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
From inspecting the messages passing through shotover I could see that ListOffsets is hitting NOT_LEADER_OR_FOLLOWER errors. (error code 6)
For example:
After some retries it will eventually hit the correct node and succeed:
It seems that the java driver is robust enough to retry only the parts of the request that failed, so eventually it will succesfully complete and move on. However, it is still quite slow and wasteful to error like this.
To fix these errors, we need to split and combine the request like we do for fetch and produce messages.
So this PR implements that fetch/combine logic for ListOffsets.
I added an integration test to call the listOffsets method from the admin API.
This test does not fail due to java's robust retry mechanism but I've manually verified that the errors are gone with the new routing logic and the new test adds more coverage of the driver.
Another nice outcome of this fix is that
cluster_1_rack_single_shotover::case_2_java
now completes in 100s down from 110s