Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Blockstore::get_slots_since() filter out results for dead slots #28344

Closed
wants to merge 1 commit into from

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Oct 11, 2022

Problem

This function is used by ReplayStage to find new slots to replay. If we know the slot is dead, then we can ignore the slot instead of creating the bank and later finding out that it is dead.

Moreso, if the slot was processed in validator startup processing (blockstore_processor::load_frozen_forks()) and then also processed by ReplayStage, we will have created a bank for that slot twice, which causes a downchain failure.

Summary of Changes

Filter out dead slots from Blockstore::get_slots_since().

Fixes #28343

This function is used by ReplayStage to find new slots to replay. If we
know the slot is dead, then we can ignore the slot instead of creating
the bank and later finding out that it is dead.
@steviez
Copy link
Contributor Author

steviez commented Oct 11, 2022

Think this one needs to go in v1.14 and v1.13/v1.10

.into_iter()
.collect();
let dead_slots = dead_slots?;
// If there is anything in the dead_slots cf for this slot, it is dead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf = column family?


let slot_metas: Result<Vec<Option<SlotMeta>>> = self
.meta_cf
.multi_get(alive_slots.to_vec())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alive_slots is already a vec. is this required to clone it?

@@ -5705,6 +5723,7 @@ pub mod tests {

#[test]
fn test_get_slots_since() {
solana_logger::setup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get rid of this

let slot_metas = slot_metas?;

let result: HashMap<u64, Vec<u64>> = slots
let result: HashMap<u64, Vec<u64>> = alive_slots
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u64 could be Slot for clarity and consistency.

Copy link
Contributor

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @carllin should review the meaning/concept of get_slots_since not returning dead slots for correctness.

@steviez
Copy link
Contributor Author

steviez commented Oct 11, 2022

Talking with Carl, this solution may not be the safest. I have another idea, but it is a little more involved. Will pursue that

@steviez
Copy link
Contributor Author

steviez commented Nov 1, 2022

Circling back, this PR is NOT the correct approach to take as this comment calls out

// All errors must lead to marking the slot as dead, otherwise,
// the `check_slot_agrees_with_cluster()` called by `replay_active_banks()`
// will break!

and which is later shown here

match bank_status {
BankStatus::Dead | BankStatus::Frozen(_) => (),
// No action to be taken yet
BankStatus::Unprocessed => {
return vec![];
}
}
let bank_hash = bank_status.bank_hash().expect("bank hash must exist");
let mut state_changes = vec![];
let is_dead = bank_status.is_dead();
if is_dead {
state_changes.push(ResultingStateChange::SendAncestorHashesReplayUpdate(
AncestorHashesReplayUpdate::DeadDuplicateConfirmed(slot),
));
}

ReplayStage needs to know about these dead slots, so changing the function to not return them will definitely break things. Abandoning this PR in lieu of another apporach

@steviez steviez closed this Nov 1, 2022
@steviez steviez deleted the get_slots_since_dead branch November 1, 2022 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic from attempting to replay a dead slot
2 participants