Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: GlobalState::shutdown instead of CancellationToken everywhere #900

Merged
merged 4 commits into from
May 22, 2024

Conversation

dinhani-cw
Copy link
Contributor

@dinhani-cw dinhani-cw commented May 22, 2024

GlobalState::shutdown_from(caller: &str, reason: &str) signals that the application will shutdown.
GlobalState::is_shutdown() checks if the application is shutting down.

@dinhani-cw dinhani-cw requested a review from a team as a code owner May 22, 2024 20:43
Copy link

PR Review 🔍

⏱️ Estimated effort to review [1-5]

4, because the PR involves significant changes across multiple files with complex logic related to task cancellation and error handling. Understanding the impact of these changes on the system's behavior, especially in asynchronous and multi-threaded contexts, requires careful consideration and testing.

🧪 Relevant tests

No

⚡ Possible issues

Possible Bug: The removal of CancellationToken and its replacement with GlobalState might lead to scenarios where tasks do not terminate correctly or timely, especially if the shutdown signals are not propagated or checked frequently enough within long-running loops or blocking operations.

Error Handling: Several places in the code now directly return or log errors without rethrowing them, which might lead to suppressed exceptions that could affect the flow of the application, especially in critical sections where transaction consistency and data integrity are essential.

🔒 Security concerns

No

Code feedback:
relevant filesrc/bin/importer_offline.rs
suggestion      

Consider adding a check after long-running loops or blocking operations to ensure that the system responds promptly to shutdown signals. This can prevent the system from hanging or delaying shutdown unnecessarily. [important]

relevant lineif GlobalState::is_shutdown() {

relevant filesrc/bin/importer_offline.rs
suggestion      

Implement error handling for critical operations such as rpc_storage.read_blocks_in_range(start, end). Ensure that any failures in these operations are logged and handled appropriately to maintain system stability and data consistency. [important]

relevant linelet task = load_blocks_and_receipts(Arc::clone(&rpc_storage), start, end);

relevant filesrc/bin/importer_online.rs
suggestion      

To improve the robustness of the system, consider implementing a retry mechanism for operations that may fail due to transient issues, such as network requests or database operations. This can help in maintaining the service availability and data accuracy. [medium]

relevant lineif executor.reexecute_external(&block, &receipts).await.is_err() {

relevant filesrc/eth/block_miner.rs
suggestion      

Add detailed logging at various points in the mining process to provide insights into the mining operations and to aid in debugging issues related to block processing and mining. [medium]

relevant linetracing::info!(lag_ys = %tick.elapsed().as_micros(), "interval mining block");

Copy link

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Score
Error handling
Replace unwrap with error handling to prevent potential runtime panics

Replace the unwrap calls with proper error handling to prevent potential panics if blocks
is empty. This ensures that the function can handle cases where no blocks are available
gracefully.

src/bin/importer_offline.rs [147-148]

-let block_start = blocks.first().unwrap().number();
-let block_end = blocks.last().unwrap().number();
+let block_start = blocks.first().ok_or(anyhow!("No blocks available"))?.number();
+let block_end = blocks.last().ok_or(anyhow!("No blocks available"))?.number();
 
Suggestion importance[1-10]: 10

Why: Replacing unwrap with proper error handling is crucial to prevent potential runtime panics, which can cause the application to crash. This change significantly improves the robustness and reliability of the code.

10
Improve error handling by propagating errors in the main function

Consider handling the potential error from block_on directly in the main function to
ensure that any initialization errors are caught and handled appropriately, rather than
implicitly ignoring them.

src/bin/importer_offline.rs [43]

-global_services.runtime.block_on(run(global_services.config))
+global_services.runtime.block_on(run(global_services.config))?
 
Suggestion importance[1-10]: 8

Why: Propagating errors in the main function ensures that initialization errors are not silently ignored, which is important for robust error handling. This change improves the reliability of the application by making sure that any issues during initialization are properly reported.

8
Possible bug
Handle potential errors during sleep duration calculation to prevent runtime panics

Handle the potential error from thread::sleep to avoid unwrapping and potentially
panicking at runtime.

src/config.rs [384]

-thread::sleep((next_second - Utc::now()).to_std().unwrap());
+if let Ok(duration) = (next_second - Utc::now()).to_std() {
+    thread::sleep(duration);
+} else {
+    tracing::error!("Failed to calculate sleep duration");
+}
 
Suggestion importance[1-10]: 10

Why: Handling potential errors during sleep duration calculation is crucial to prevent runtime panics, which is a significant improvement for code robustness.

10
Handle errors during asynchronous operations to prevent silent failures

Consider handling the potential error from
runtime.block_on(infra::init_tracing(config.common().tracing_url.as_ref())) to ensure that
any issues during tracing initialization are properly managed and do not silently fail.

src/globals.rs [58]

-runtime.block_on(infra::init_tracing(config.common().tracing_url.as_ref()));
+runtime.block_on(infra::init_tracing(config.common().tracing_url.as_ref()))?;
 
Suggestion importance[1-10]: 9

Why: Handling potential errors in asynchronous operations is important to prevent silent failures, which can lead to difficult-to-diagnose issues. This suggestion addresses a possible bug and improves the robustness of the code.

9
Maintainability
Use a constant for task names to improve code maintainability

Replace the hardcoded task name string with a constant to avoid typos and improve
maintainability.

src/config.rs [70]

-warn_task_cancellation("rpc subscription cleaner");
+const TASK_NAME: &str = "rpc subscription cleaner";
+warn_task_cancellation(TASK_NAME);
 
Suggestion importance[1-10]: 9

Why: Using a constant for task names improves maintainability and reduces the risk of typos, which is a significant improvement for code quality.

9
Improve the clarity and specificity of the environment variable name

Consider using a more specific environment variable name than PERM_STORAGE_CONNECTIONS.
This name is very generic and could potentially conflict with other applications or be
misleading about its purpose. A more descriptive name related to its specific use in
controlling connection behavior in your application would improve clarity and
maintainability.

src/globals.rs [40-41]

-if env::var_os("PERM_STORAGE_CONNECTIONS").is_some_and(|value| value == "1") {
-    println!("WARNING: env var PERM_STORAGE_CONNECTIONS is set to 1, if it cause connection problems, try increasing it");
+if env::var_os("APP_SPECIFIC_STORAGE_CONNECTIONS").is_some_and(|value| value == "1") {
+    println!("WARNING: env var APP_SPECIFIC_STORAGE_CONNECTIONS is set to 1, if it cause connection problems, try increasing it");
 }
 
Suggestion importance[1-10]: 7

Why: The suggestion improves the clarity and specificity of the environment variable name, which enhances maintainability and reduces the risk of conflicts with other applications. However, it is not a critical change.

7
Enhance logging by using structured key-value pairs

Use structured logging for better traceability and debugging. Replace string interpolation
in logging with structured key-value pairs.

src/bin/importer_offline.rs [131]

-tracing::info!("starting {}", TASK_NAME);
+tracing::info!(task = %TASK_NAME, "starting task");
 
Suggestion importance[1-10]: 6

Why: Using structured logging improves traceability and debugging, making logs easier to parse and analyze. However, this is a minor improvement focused on maintainability and does not address any critical issues.

6
Refactor the initialization function into smaller, more manageable functions

To improve the readability and maintainability of the GlobalServices::init function,
consider refactoring it into smaller functions. Each function would handle a specific part
of the initialization process (e.g., configuration parsing, runtime initialization,
metrics, sentry, signal handling, tracing).

src/globals.rs [31-64]

 pub fn init() -> anyhow::Result<Self> {
-    load_dotenv();
-    let config = T::parse();
-    if env::var_os("PERM_STORAGE_CONNECTIONS").is_some_and(|value| value == "1") {
-        println!("WARNING: env var PERM_STORAGE_CONNECTIONS is set to 1, if it cause connection problems, try increasing it");
-    }
-    let runtime = config.common().init_runtime();
-    #[cfg(feature = "metrics")]
-    infra::init_metrics(config.common().metrics_histogram_kind);
-    let _sentry_guard = config.common().sentry_url.as_ref().map(|sentry_url| infra::init_sentry(sentry_url));
-    runtime.block_on(spawn_signal_handler())?;
-    runtime.block_on(infra::init_tracing(config.common().tracing_url.as_ref()));
+    self.load_env_variables()?;
+    let config = self.parse_config()?;
+    self.handle_perm_storage_connections()?;
+    let runtime = self.initialize_runtime(&config)?;
+    self.initialize_metrics(&config)?;
+    let _sentry_guard = self.initialize_sentry(&config)?;
+    self.handle_signals(&runtime)?;
+    self.initialize_tracing(&runtime, &config)?;
     Ok(Self {
         config,
         runtime,
         _sentry_guard,
     })
 }
 
Suggestion importance[1-10]: 6

Why: Refactoring the initialization function into smaller functions can improve readability and maintainability. However, the current implementation is not incorrect, and this change is more about code style and organization.

6
Enhancement
Log error details before shutting down to provide more insight into failures

Consider handling the error from chain.subscribe_new_heads().await more gracefully by
logging the error details before shutting down, which could provide more insight into the
failure.

src/bin/importer_online.rs [164-167]

-Err(_) => {
+Err(e) => {
+    tracing::error!("Failed to subscribe to newHeads event: {:?}", e);
     GlobalState::shutdown_from(TASK_NAME, "cannot subscribe to newHeads event");
     return;
 }
 
Suggestion importance[1-10]: 9

Why: Logging the error details before shutting down provides valuable information for troubleshooting and understanding the cause of the failure. This is a significant improvement for maintainability and debugging.

9
Add detailed error logging before shutdown for better troubleshooting

Add error handling for executor.reexecute_external and
miner.mine_external_mixed_and_commit to log specific errors before shutdown, which can aid
in troubleshooting.

src/bin/importer_online.rs [132-139]

-if executor.reexecute_external(&block, &receipts).await.is_err() {
+if let Err(e) = executor.reexecute_external(&block, &receipts).await {
+    tracing::error!("Failed to re-execute block: {:?}", e);
     GlobalState::shutdown_from(TASK_NAME, "failed to re-execute block");
     return;
 };
-if miner.mine_external_mixed_and_commit().await.is_err() {
+if let Err(e) = miner.mine_external_mixed_and_commit().await {
+    tracing::error!("Failed to mine external block: {:?}", e);
     GlobalState::shutdown_from(TASK_NAME, "failed to mine external block");
     return;
 };
 
Suggestion importance[1-10]: 9

Why: Adding detailed error logging before shutdown is crucial for diagnosing issues and understanding why the shutdown occurred. This enhancement significantly improves the code's robustness and maintainability.

9
Add a retry mechanism for subscribing to newHeads to handle transient failures

Implement a retry mechanism for chain.subscribe_new_heads() before deciding to shut down,
as transient network issues could be causing the subscription to fail.

src/bin/importer_online.rs [160-167]

-match chain.subscribe_new_heads().await {
+match retry::retry(retry::delay::Fixed::from_millis(100).take(5), || chain.subscribe_new_heads()).await {
     Ok(sub) => Some(sub),
     Err(_) => {
-        GlobalState::shutdown_from(TASK_NAME, "cannot subscribe to newHeads event");
+        GlobalState::shutdown_from(TASK_NAME, "cannot subscribe to newHeads event after retries");
         return;
     }
 }
 
Suggestion importance[1-10]: 8

Why: Implementing a retry mechanism can handle transient network issues, reducing unnecessary shutdowns. This is a valuable enhancement for improving the reliability of the system.

8
Improve the clarity of task cancellation messages by including specific reasons

Replace the use of warn_task_cancellation with a more specific error message that includes
the reason for the task cancellation. This will help in debugging and understanding the
context of the shutdown more clearly.

src/bin/importer_online.rs [123]

-warn_task_cancellation(TASK_NAME);
+warn_task_cancellation(TASK_NAME, "Global state is shutting down");
 
Suggestion importance[1-10]: 7

Why: This suggestion improves the clarity of the task cancellation messages, which can aid in debugging. However, it is a minor enhancement and not critical to the functionality of the code.

7
Best practice
Use a consistent logging framework for all log messages

Replace the println! statement with a logging framework call to ensure that all logs are
consistently formatted and managed. This change will help in maintaining a clean and
scalable logging strategy.

src/globals.rs [41]

-println!("WARNING: env var PERM_STORAGE_CONNECTIONS is set to 1, if it cause connection problems, try increasing it");
+tracing::warn!("WARNING: env var PERM_STORAGE_CONNECTIONS is set to 1, if it cause connection problems, try increasing it");
 
Suggestion importance[1-10]: 8

Why: Using a consistent logging framework is a best practice that ensures all logs are managed and formatted uniformly, which is crucial for debugging and maintaining the application.

8
Enhance logging by using structured logs

Use structured logging for better traceability and filtering in log management systems.

src/config.rs [65]

-tracing::info!("spawning rpc subscriptions cleaner");
+tracing::info!(task = "rpc subscriptions cleaner", "Spawning task");
 
Suggestion importance[1-10]: 7

Why: Structured logging enhances traceability and filtering, which is beneficial for log management systems, but it is not a critical change.

7
Possible issue
Use unwrap_or_default to safely handle potential None values

Consider using unwrap_or_default for default string handling to avoid panics if to_str()
returns None.

src/config.rs [822]

-let binary_basename = binary.file_name().unwrap().to_str().unwrap().to_lowercase();
+let binary_basename = binary.file_name().unwrap().to_str().unwrap_or_default().to_lowercase();
 
Suggestion importance[1-10]: 8

Why: Using unwrap_or_default to handle potential None values safely is a good practice to avoid runtime panics, improving code reliability.

8
Performance
Optimize by reducing unnecessary use of Arc::clone

Replace the use of Arc::clone(&storage) with a direct reference to storage if the Arc is
not required for shared ownership across threads or asynchronous tasks within the init
function. This can reduce unnecessary reference counting operations if storage is not used
concurrently.

src/bin/importer_offline.rs [49]

-let storage = config.storage.init().await?;
+let storage = &config.storage.init().await?;
 
Suggestion importance[1-10]: 3

Why: While the suggestion to avoid unnecessary use of Arc::clone is valid for performance optimization, it is not crucial and does not address a major issue. The current use of Arc::clone is not incorrect and may be necessary if storage is used concurrently elsewhere.

3

renancloudwalk
renancloudwalk previously approved these changes May 22, 2024
@dinhani-cw dinhani-cw merged commit 466b94f into main May 22, 2024
27 checks passed
@dinhani-cw dinhani-cw deleted the global-cancellation branch May 22, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants