Skip to content

Commit

Permalink
pd(migrate): trace and check state version divergence (#4964)
Browse files Browse the repository at this point in the history
## Describe your changes

This PR:
- logs state divergences when processing `EndBlock` messages
- prevents `pd` operators from migrating corrupted state unless
`--force` is on

## Checklist before requesting a review

- [x] I have added guiding text to explain how a reviewer should test
these changes: simple control flow.

- [x] If this code contains consensus-breaking changes, I have added the
"consensus-breaking" label. Otherwise, I declare my belief that there
are not consensus-breaking changes, for the following reason:
  • Loading branch information
erwanor authored and conorsch committed Dec 17, 2024
1 parent c4ab7d7 commit c9b5db8
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 1 deletion.
3 changes: 2 additions & 1 deletion crates/bin/pd/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,8 @@ pub enum RootCommand {
#[clap(long, display_order = 200)]
comet_home: Option<PathBuf>,
/// If set, force a migration to occur even if the chain is not halted.
/// Will not override a detected mismatch in state versions.
/// Will not override a detected mismatch in state versions, or on signs
/// of corruption. This is "expert mode" and potentially destructive.
#[clap(long, display_order = 1000)]
force: bool,
/// If set, edit local state to permit the node to start, despite
Expand Down
11 changes: 11 additions & 0 deletions crates/bin/pd/src/migrate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,17 @@ impl Migration {
storage.latest_snapshot().is_chain_halted().await || force,
"to run a migration, the chain halt bit must be set to `true` or use the `--force` cli flag"
);

// Assert that the local chain state version is not corrupted, see `v0.80.10` release notes.
let latest_version = storage.latest_version();
let block_height = storage.latest_snapshot().get_block_height().await?;
ensure!(
latest_version == block_height || force,
"local chain state version is corrupted: {} != {}",
latest_version,
block_height
);

tracing::info!("started migration");

// If this is `ReadyToStart`, we need to reset the halt bit and return early.
Expand Down
7 changes: 7 additions & 0 deletions crates/core/app/src/server/consensus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,13 @@ impl Consensus {
async fn end_block(&mut self, end_block: request::EndBlock) -> response::EndBlock {
let latest_state_version = self.storage.latest_version();
tracing::info!(height = ?end_block.height, ?latest_state_version, "ending block");
if latest_state_version >= end_block.height as u64 {
tracing::warn!(
%latest_state_version,
%end_block.height,
"chain state version is ahead of the block height, this is an unexpected corruption of chain state"
);
}
let events = self.app.end_block(&end_block).await;
trace_events(&events);

Expand Down

0 comments on commit c9b5db8

Please sign in to comment.