kafka_int_tests::cluster_1_rack_single_shotover::case_1_cpp fix intermittent failure #1834
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Attempt to fix intermittent failure: https://github.com/shotover/shotover-proxy/actions/runs/11926547980/job/33240776520
The panic occurs because we attempted to route to a random node but shotover does not have any nodes in its metadata.
This should be impossible because we always query for a list of nodes before we reach the routing stage.
From looking at the log I can see that we hit the panic 15ms after shotover was ready to accept connections.
This indicates that while the kafka cluster was ready for connections it likely still had a little bit of initialization work left to do and therefore returned an empty list of nodes.
I have seen this kind of behavior before from redis and I think cassandra as well.
So I am fairly confident that adding a retry to our broker list query will resolve this intermittent failure.
Unfortunately I cannot reproduce the issue locally to verify.