Move more logic in blockstore_processor behind allow_dead_slots flag #2341

steviez · 2024-07-29T19:43:11Z

Problem

A flag in ProcessOptions can be set to allow dead slots. If the flag is
not set and a block is marked dead, fetching the entries will fail as
the blockstore method to retrive entries checks if the slot is dead.

Within process_next_slots(), new Banks are created as children of
already replayed banks are discovered. The children Banks are created
prior to fetching entries, and thus, a Bank could be created for a
dead slot that will eventually be discarded.

Summary of Changes

Instead of allowing the extra work of creating a Bank to proceed, check
if a slot is dead (and allow_dead_slots=false) BEFORE creating a Bank
for the slot.

For the case of a validator, allow_dead_slots=false for the local ledger replay at startup. So, we will avoid processing any known dead slots altogether. On the other hand, ReplayStage will create a Bank for dead slots, see it is dead and then proceed on other forks. By avoiding the creation of Bank in local ledger replay, we'll avoid this non-fatal error:

[... ERROR solana_accounts_db::accounts_db] set_hash: already exists; multiple forks with shared slot X as child (parent: Y)!?

Fixes solana-labs#28343

A flag in ProcessOptions can be set to allow dead slots. If the flag is not set and a block is marked dead, fetching the entries will fail as the entry fetch method internally checks if the slot is dead. Within process_next_slots(), new Banks are created as replay progresses through slots. If allow_dead_slots=false and a dead slot is loaded, replay of the slot will error. That error is handled and results in the Bank being removed from BankForks. Instead of allowing the extra work to proceed, check if new slots to replay are dead (and allow_dead_slots=false) BEFORE the creation of a new Bank.

bw-solana

LGTM

bw-solana · 2024-07-30T23:10:12Z

ledger/src/blockstore_processor.rs

@@ -1725,18 +1725,21 @@ fn process_next_slots(
    blockstore: &Blockstore,
    leader_schedule_cache: &LeaderScheduleCache,
    pending_slots: &mut Vec<(SlotMeta, Bank, Hash)>,
-    halt_at_slot: Option<Slot>,
+    opts: &ProcessOptions,


don't love the name, but I know you're just maintaining the existing convention, so I'll hold my nose

AshwinSekar

LGTM.
With this change, do we feel confident to upgrade the set_hash to a panic? i.e. running solana-labs#33186 on master/inv net for a while and see what happens?

steviez · 2024-07-31T16:12:32Z

LGTM. With this change, do we feel confident to upgrade the set_hash to a panic? i.e. running solana-labs#33186 on master/inv net for a while and see what happens?

Potentially, let me think about it a little and will followup with you. Per this Discord message, Brooks had a similar idea at one point but mentioned seeing a CI failure. Obviously, if we have a test that hits this, that would be great to fix, but for a live cluster, there is still more unknown

steviez added 2 commits July 29, 2024 17:04

Use combinator to simplify check for slot < halt_at_slot

8bb0c3d

steviez force-pushed the bstore_proc_dead_slots branch from 1e76423 to 2dc1b2d Compare July 29, 2024 22:04

steviez requested review from AshwinSekar and bw-solana July 30, 2024 03:44

bw-solana approved these changes Jul 30, 2024

View reviewed changes

AshwinSekar approved these changes Jul 31, 2024

View reviewed changes

steviez merged commit d7b22e2 into anza-xyz:master Jul 31, 2024
41 checks passed

steviez deleted the bstore_proc_dead_slots branch July 31, 2024 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move more logic in blockstore_processor behind allow_dead_slots flag #2341

Move more logic in blockstore_processor behind allow_dead_slots flag #2341

steviez commented Jul 29, 2024 •

edited

Loading

bw-solana left a comment

bw-solana Jul 30, 2024

AshwinSekar left a comment

steviez commented Jul 31, 2024

Move more logic in blockstore_processor behind allow_dead_slots flag #2341

Move more logic in blockstore_processor behind allow_dead_slots flag #2341

Conversation

steviez commented Jul 29, 2024 • edited Loading

Problem

Summary of Changes

bw-solana left a comment

Choose a reason for hiding this comment

bw-solana Jul 30, 2024

Choose a reason for hiding this comment

AshwinSekar left a comment

Choose a reason for hiding this comment

steviez commented Jul 31, 2024

steviez commented Jul 29, 2024 •

edited

Loading