-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect offline shotover nodes for KafkaSinkCluster #1762
Detect offline shotover nodes for KafkaSinkCluster #1762
Conversation
CodSpeed Performance ReportMerging #1762 will degrade performances by 11.83%Comparing Summary
Benchmarks breakdown
|
The regression benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I've left some minor feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, happy for this to land as is, we can make use of shotover/tokio-bin-process#41 and shotover/tokio-bin-process#42 in a follow up cleanup if we want.
After introducing
ShotoverNodeState
toShotoverNode
in #1758, we should add a task to detect down shotover nodes and setShotoverNodeState
accordingly.This PR adds a background task
check_shotover_peers
looping over peer shotover nodes and trying to open a TCP connection to each peer shotover node. If the connection cannot be established withinconnect_timeout_ms
, the peer node is marked as down.connect_timeout_ms
is the same configuration used when creating a connection to a destination kafka broker.check_shotover_peers_delay_ms
+ random(-check_shotover_peers_delay_ms
/10,check_shotover_peers_delay_ms
/10)) before moving to the next peer shotover node.start_shotover_peers_check
is called when the instance ofKafkaSinkClusterBuilder
is being created and hence is called exactly once.check_shotover_peers
is be invoked at all if there's no peer shotover node (i.e., there's only 1 shotover node in the cluster)check_shotover_peers
is restarted if the creation of random number generator fails.The next PR will change metadata rewrites to exclude down shotover nodes.