Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: compare final state after relaying a block #1183

Merged
merged 49 commits into from
Jun 27, 2024

Conversation

carneiro-cw
Copy link
Contributor

No description provided.

Copy link

PR Reviewer Guide 🔍

⏱️ Estimated effort to review [1-5] 4
🧪 Relevant tests No
🔒 Security concerns No
⚡ Key issues to review Retry Logic:
The retry logic in compare_final_state method might lead to infinite loops if the errors persist. Consider implementing a maximum retry limit.
Error Handling:
The error handling in the compare_final_state method logs warnings and errors but does not escalate or handle them in a way that might prevent further incorrect operations.
Span Fields:
The compare_final_state method is instrumented with tracing, but it does not record the block_number as a span field, which could be crucial for debugging.
Database Schema:
The primary key in the slot_mismatches table is defined only on address and index, which might not be unique across different block_number. Consider including block_number in the primary key.

Copy link

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Score
Possible bug
Handle potential None from max() to prevent runtime panics

Consider handling the case where max() returns None which could happen if block_numbers is
empty, to prevent panics at runtime.

src/eth/relayer/external.rs [152]

-let max_number = block_numbers.iter().max().cloned().unwrap();
+let max_number = match block_numbers.iter().max().cloned() {
+    Some(max) => max,
+    None => {
+        tracing::warn!("No block numbers available for processing.");
+        return Ok(vec![]);
+    }
+};
 
Suggestion importance[1-10]: 10

Why: Handling the case where max() returns None is crucial to prevent runtime panics, ensuring the robustness and stability of the application.

10
Possible issue
Add error handling for continuous fetch failures to prevent infinite retries

Consider adding error handling for the case when substrate_chain.fetch_storage_at and
stratus_chain.fetch_storage_at continuously fail after several retries. This will prevent
the system from entering an infinite loop of retries.

src/eth/relayer/external.rs [94-97]

-match self.stratus_chain.fetch_storage_at(&addr, &index, point_in_time).await {
-    Ok(value) => break value,
-    Err(err) => tracing::warn!(?addr, ?index, ?err, "failed to fetch slot value from stratus, retrying..."),
-}
+let mut retry_count = 0;
+let stratus_slot_value = loop {
+    match self.stratus_chain.fetch_storage_at(&addr, &index, point_in_time).await {
+        Ok(value) => break value,
+        Err(err) => {
+            if retry_count >= MAX_RETRIES {
+                tracing::error!(?addr, ?index, ?err, "max retries reached for fetching slot value from stratus");
+                break None;
+            }
+            tracing::warn!(?addr, ?index, ?err, "failed to fetch slot value from stratus, retrying...");
+            retry_count += 1;
+        },
+    }
+};
 
Suggestion importance[1-10]: 9

Why: Adding error handling for continuous fetch failures is crucial to prevent the system from entering an infinite loop, which could lead to resource exhaustion and degraded performance.

9
Enhancement
Implement a backoff strategy for retrying database operations

Implement a backoff strategy for retrying database insert operations to handle transient
database connectivity issues more gracefully.

src/eth/relayer/external.rs [109-119]

+let mut attempts = 0;
 while let Err(e) = sqlx::query!(
     "INSERT INTO slot_mismatches (address, index, block_number, stratus_value, substrate_value) VALUES ($1, $2, $3, $4, $5) ON CONFLICT DO NOTHING",
     addr as _,
     index as _,
     block_number as _,
     stratus_slot_value as _,
     substrate_slot_value as _
 )
 .execute(&self.pool)
 .await {
-    tracing::warn!(?e, "failed to insert slot mismatch, retrying...")
+    if attempts >= MAX_RETRY_ATTEMPTS {
+        tracing::error!(?e, "failed to insert slot mismatch after several attempts");
+        break;
+    }
+    tracing::warn!(?e, "failed to insert slot mismatch, retrying...");
+    attempts += 1;
+    tokio::time::sleep(RETRY_BACKOFF[attempts]).await;
 }
 
Suggestion importance[1-10]: 8

Why: Implementing a backoff strategy for retrying database operations is a good enhancement to handle transient issues more gracefully and avoid overwhelming the database with rapid retries.

8
Best practice
Ensure thread safety for the compare_final_state function

To avoid potential data races or inconsistencies, ensure that the compare_final_state
function is thread-safe if it might be accessed concurrently.

src/eth/relayer/external.rs [89-123]

 async fn compare_final_state(&self, changed_slots: HashSet<(Address, SlotIndex)>, block_number: BlockNumber) {
-    let point_in_time = StoragePointInTime::Past(block_number);
+    let point_in_time = StoragePointIn_Time::Past(block_number);
+    let lock = self.state_lock.lock().await;  // Assuming `state_lock` is an async mutex
     for (addr, index) in changed_slots {
         let stratus_slot_value = loop {
             match self.stratus_chain.fetch_storage_at(&addr, &index, point_in_time).await {
                 Ok(value) => break value,
                 Err(err) => tracing::warn!(?addr, ?index, ?err, "failed to fetch slot value from stratus, retrying..."),
             }
         };
         ...
     }
+    drop(lock);
 }
 
Suggestion importance[1-10]: 7

Why: Ensuring thread safety for the compare_final_state function is a good practice to avoid potential data races or inconsistencies, especially in concurrent environments.

7

@carneiro-cw carneiro-cw merged commit 576001a into main Jun 27, 2024
33 checks passed
@carneiro-cw carneiro-cw deleted the relayer_compare_slots branch June 27, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant