Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: retry followers gathering #954

Closed
wants to merge 1 commit into from

Conversation

renancloudwalk
Copy link
Contributor

No description provided.

@renancloudwalk renancloudwalk requested a review from a team as a code owner May 30, 2024 01:44
Copy link

PR Review 🔍

⏱️ Estimated effort to review [1-5]

3, because the PR involves changes to the consensus mechanism in a distributed system, which is inherently complex. The introduction of a retry mechanism for network connections requires careful consideration of edge cases and potential failure modes.

🧪 Relevant tests

No

⚡ Possible issues

Possible Bug: The retry_connect function does not handle the case where RETRY_ATTEMPTS is set to zero, which could lead to an infinite loop or immediate failure without any retry attempts.

Performance Concern: The use of sleep(RETRY_DELAY) within the retry loop could introduce significant delays in the consensus process, especially if RETRY_ATTEMPTS is set to a high number and the delay is substantial.

🔒 Security concerns

No

Code feedback:
relevant filesrc/eth/consensus.rs
suggestion      

Consider implementing exponential backoff for the retry mechanism instead of a fixed delay. This can help in handling high loads or network issues more gracefully by gradually increasing the wait time between retries, thus reducing the load on the network and potentially increasing the chance of a successful connection. [important]

relevant linesleep(RETRY_DELAY).await;

relevant filesrc/eth/consensus.rs
suggestion      

It's important to ensure that RETRY_ATTEMPTS is greater than zero to avoid potential infinite loops or immediate failures. Adding a check at the start of the retry_connect function to return an error if RETRY_ATTEMPTS is zero can prevent such issues. [important]

relevant linefor attempt in 1..=RETRY_ATTEMPTS {

relevant filesrc/eth/consensus.rs
suggestion      

To improve the robustness of the retry mechanism, consider adding a maximum timeout for the total retry duration. This can prevent the system from hanging indefinitely in scenarios where the connection cannot be established despite multiple retries. [medium]

relevant linesleep(RETRY_DELAY).await;

relevant filesrc/eth/consensus.rs
suggestion      

To enhance error handling, consider logging the number of successful connections versus failed attempts after the retry loop completes. This could provide valuable insights during debugging and monitoring of the system's connectivity. [medium]

relevant lineErr(e) => {

Copy link

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Score
Performance
Use a reference instead of cloning the string in each iteration

Avoid cloning the address string in each loop iteration by using a reference instead.

src/eth/consensus.rs [215-220]

-match AppendEntryServiceClient::connect(address.clone()).await {
+match AppendEntryServiceClient::connect(&address).await {
     Ok(client) => return Ok(client),
     Err(e) => {
         tracing::warn!("Failed to connect to {}: attempt {} of {}: {:?}", address, attempt, RETRY_ATTEMPTS, e);
         sleep(RETRY_DELAY).await;
     }
 }
 
Suggestion importance[1-10]: 9

Why: Using a reference instead of cloning the string in each iteration improves performance by reducing unnecessary allocations, making the code more efficient.

9
Implement exponential backoff for retry delays

Consider using exponential backoff for the retry delay to handle high load or transient
issues more effectively.

src/eth/consensus.rs [219]

-sleep(RETRY_DELAY).await;
+let backoff_time = RETRY_DELAY * attempt.pow(2);
+sleep(backoff_time).await;
 
Suggestion importance[1-10]: 7

Why: Implementing exponential backoff can improve the retry mechanism's effectiveness under high load or transient issues, but it is a performance enhancement rather than a critical fix.

7
Error handling
Handle errors from the sleep function to ensure robustness

Handle potential errors from the sleep function to ensure the retry mechanism is robust.

src/eth/consensus.rs [219]

-sleep(RETRY_DELAY).await;
+if let Err(e) = sleep(RETRY_DELAY).await {
+    tracing::error!("Error during sleep between retries: {:?}", e);
+    return Err(anyhow!("Error during sleep between retries: {:?}", e));
+}
 
Suggestion importance[1-10]: 8

Why: Handling potential errors from the sleep function improves the robustness of the retry mechanism, ensuring that any issues during sleep are properly logged and handled.

8

@renancloudwalk
Copy link
Contributor Author

we will try another strategy, instead of gathering followers at startup, lets keep gathering them as the leader

@dinhani-cw dinhani-cw deleted the retry-followers-gathering branch July 4, 2024 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant