Dbg peer discovery #1042

renancloudwalk · 2024-06-08T19:07:56Z

No description provided.

split brain

…covery

This reverts commit 809ac98.

github-actions · 2024-06-08T19:08:47Z

PR Review 🔍

⏱️ Estimated effort to review [1-5]	3, because the PR involves changes to the peer discovery mechanism which is critical for network operations. The changes include error handling and retry logic which need careful review to ensure they don't introduce new issues or affect the system's stability.
🧪 Relevant tests	No
⚡ Possible issues	Possible Bug: The retry mechanism in `discover_peers_kubernetes` uses a fixed sleep duration which might not be optimal under different network conditions or Kubernetes API server response times.
⚡ Possible issues	Performance Concern: The use of synchronous sleep within an asynchronous context could potentially block the executor or lead to inefficient task scheduling.
🔒 Security concerns	No

Code feedback:

relevant file	src/eth/consensus/mod.rs
suggestion	Consider using an exponential backoff strategy for retrying peer discovery to handle failures more gracefully under varying network conditions. This approach can help avoid constant polling and reduce load on the Kubernetes API server. [important]
relevant line	sleep(Duration::from_millis(100)).await;

relevant file	src/eth/consensus/mod.rs
suggestion	Replace `sleep` with `tokio::time::sleep`, ensuring that the async sleep function is used, which is non-blocking and more appropriate within an async function. This change is crucial to maintain the asynchronous nature of the operation without blocking the executor. [important]
relevant line	sleep(Duration::from_millis(100)).await;

relevant file	src/eth/consensus/mod.rs
suggestion	Implement structured logging for better traceability and debugging. Include more context in log messages, such as the current state of the system or values of critical variables at the time of logging. [medium]
relevant line	tracing::warn!("failed to discover peers from Kubernetes (attempt {}/{}): {:?}", attempts, max_attempts, e);

relevant file	src/eth/consensus/mod.rs
suggestion	Consider separating the retry logic into a dedicated async function to improve modularity and readability of the `discover_peers` function. This would also make unit testing easier by isolating the retry mechanism. [medium]
relevant line	while attempts < max_attempts {

github-actions · 2024-06-08T19:09:31Z

PR Code Suggestions ✨

Category	Suggestion	Score
Enhancement	Implement exponential backoff for retrying peer discovery Consider using an exponential backoff strategy for retrying peer discovery in Kubernetes to handle failures more gracefully and reduce the load on the Kubernetes API server during outages or high error rates. src/eth/consensus/mod.rs [502] -sleep(Duration::from_millis(100)).await; +sleep(Duration::from_millis(100 * 2.pow(attempts))).await; Suggestion importance[1-10]: 9 Why: Implementing exponential backoff can significantly improve the robustness and efficiency of the retry mechanism, especially under failure conditions. This change is both relevant and beneficial.	9
Maintainability	Refactor peer discovery into separate, reusable functions To improve the maintainability and readability of the code, consider refactoring the peer discovery logic into separate functions or methods, especially since the logic for Kubernetes and environment-based discovery are quite distinct. src/eth/consensus/mod.rs [486-504] -match Self::discover_peers_kubernetes(Arc::clone(&consensus)).await { - Ok(k8s_peers) => { - new_peers.extend(k8s_peers); - tracing::info!("discovered {} peers from kubernetes", new_peers.len()); - break; - } - Err(e) => { - attempts += 1; - tracing::warn!("failed to discover peers from Kubernetes (attempt {}/{}): {:?}", attempts, max_attempts, e); - if attempts >= max_attempts { - tracing::error!("exceeded maximum attempts to discover peers from kubernetes. initiating shutdown."); - GlobalState::shutdown_from("consensus", "failed to discover peers from Kubernetes"); - } - sleep(Duration::from_millis(100)).await; - } -} +await self.retry_kubernetes_discovery(&consensus, &mut new_peers); Suggestion importance[1-10]: 7 Why: Refactoring the peer discovery logic into separate functions would enhance code readability and maintainability. This is a good practice but not critical for the current functionality.	7
Possible issue	Enhance error handling by providing more detailed error information To handle errors more robustly, consider implementing a more comprehensive error handling strategy that could include logging more details about the error or notifying an administrator when critical thresholds are reached. src/eth/consensus/mod.rs [494] -tracing::warn!("failed to discover peers from Kubernetes (attempt {}/{}): {:?}", attempts, max_attempts, e); +tracing::warn!("failed to discover peers from Kubernetes (attempt {}/{}): {:?}. Error details: {:?}", attempts, max_attempts, e, e.details()); Suggestion importance[1-10]: 6 Why: Enhancing error logging with more detailed information can be useful for debugging and monitoring. However, the current logging already provides sufficient context for most scenarios.	6
Best practice	Use asynchronous locking mechanisms to manage state safely To avoid potential deadlocks and ensure that the mutex lock is released properly, consider using asynchronous locking mechanisms provided by `tokio::sync`, such as `RwLock`. src/eth/consensus/mod.rs [518] -let mut peers_lock = consensus.peers.write().await; +let peers_lock = consensus.peers.write().await; Suggestion importance[1-10]: 3 Why: The suggestion to use `RwLock` is already being followed in the code. The change from `let mut` to `let` is minor and does not significantly impact the code's functionality or safety.	3

mayconamaroCW and others added 6 commits June 7, 2024 13:24

chore: reorder dockerfile for caching

809ac98

chore: prepare discovery for sepukku if it lose its peers to prevent

0137264

split brain

fix: discovery

058daf8

lint

9e4477d

Merge branch 'main' of github.com:cloudwalk/stratus into dbg-peer-dis…

3eee11c

…covery

Revert "chore: reorder dockerfile for caching"

af4cfbe

This reverts commit 809ac98.

renancloudwalk requested a review from a team as a code owner June 8, 2024 19:07

renancloudwalk enabled auto-merge (squash) June 8, 2024 19:08

renancloudwalk closed this Jun 9, 2024

auto-merge was automatically disabled June 9, 2024 01:23
Pull request was closed

dinhani-cw deleted the dbg-peer-discovery branch July 4, 2024 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dbg peer discovery #1042

Dbg peer discovery #1042

renancloudwalk commented Jun 8, 2024

github-actions bot commented Jun 8, 2024

github-actions bot commented Jun 8, 2024

Dbg peer discovery #1042

Dbg peer discovery #1042

Conversation

renancloudwalk commented Jun 8, 2024

github-actions bot commented Jun 8, 2024

PR Review 🔍

github-actions bot commented Jun 8, 2024

PR Code Suggestions ✨