-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exit with an error if mappings can’t be created for the chosen block #3242
base: main
Are you sure you want to change the base?
Conversation
// If the user chooses an object mapping start block we don't have the data for, we can't | ||
// create mappings for it, so the node must exit with an error. | ||
let Some(last_archived_block) = client.block(last_archived_block_hash)? else { | ||
let mapping = if create_object_mappings { | ||
"mapping " | ||
} else { | ||
"" | ||
}; | ||
let error = format!( | ||
"Missing data for {mapping}block {last_archived_block_number} hash {last_archived_block_hash}" | ||
); | ||
return Err(sp_blockchain::Error::Application(error.into())); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about checking this in initialize_archiver
instead? It should be sufficient to query best_block_to_archive
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t possible, because find_last_archived_block searches back through the segments until it finds a segment with a last archived block which is less than or equal to best_block_to_archive.
initialize_archiver doesn’t know about this logic, so it can’t check the actual last_archived_block_number found by find_last_archived_block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Achiving has the invariant that from the tip of the chain (best block) we can traverse back to some block that allows us to initialize archiver. By overriding best_block_to_archive
with block for which to create object mappings we constrain where to start.
Looking carefully we can see (best_block_to_archive..best_block_number)
iteration that ensures to step up closer to the tip to find the blocks that actually exist (right now tip-100 may not actually exist in the database if we use Snap sync and with future Snap sync we may have holes in the database).
One edge case that was problematic before was genesis block, which you handled in #3247, but that was actually not necessary strictly speaking since it is a special case of a more general problem. The same exact issue can happen if we prune blocks since Substrate still never prunes headers (Shamil is working on fixing this in paritytech/polkadot-sdk#6451), so quick check for block existence using header in above mentioned loop will succeed, while block body is already gone.
So to handle both genesis block case after Snap sync and this, all you need to do is to add additional check after the loop that block body exists. Since the only reason for it to not exist is incorrect user input for object mapping creation CLI option, we can return a corresponding error back. In all other cases node sync maintains mentioned invariants already, hence no corresponding error handling in find_last_archived_block
and why I'm insisting no changes need to be done there.
Moving override of best_block_to_archive
to after the loop breaks these invariants and simply moves its handling to a different place unnecessarily.
// There aren't any mappings in the genesis block, so starting there is pointless. | ||
// (And causes errors on restart, because genesis block data is pruned.) | ||
best_block_to_archive = best_block_to_archive.min(block_number).max(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will work fine with --sync full
, maybe make the input NonZero
if zero doesn't make sense as a value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it’s kind of a weird case. There’s nothing wrong with having mappings in the genesis block, it’s just something we choose not to do.
I’m also not sure if we can get a zero best_block_to_archive here, I’ll check the code again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-zero change is now PR #3247
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get a zero best_block_to_archive because we do a saturating subtraction on the best block, so we need to keep the max(1). (PR #3247 doesn’t fix that, it only stops the config being zero.)
I’ve been working through the different cases here, it’s a bit trickier than I thought. Ideally, if we’re doing what the user said on the command line, we should handle it this way
In those two cases, we’d never get to the new error case in find_last_archived_block(), because the empty archiver case follows a different code path. And a block number in any other segment should just work, unless the block data has been pruned. In that case, we can either:
I’m not sure if we can do the check for the new error outside find_last_archived_block(), because we’re not starting with the data of the exact block number the user asked for on the command line. We’re starting with the last block in the last segment before that (and a few other conditions). |
This is already happening, the only exception if node is started with default
The same as above, it should already work correctly with
We certainly don't want to do any networking for the purposes of returning object mappings, just return an error. The only exception is if we have a fresh node, then we can take advantage of target block support that Shamil introduced for domain snap sync, but I wouldn't go there unless we have an explicit request for such a feature.
|
cc29d2b
to
87521ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s where I got to today after checking the code in detail.
// If the user chooses an object mapping start block we don't have the data for, we can't | ||
// create mappings for it, so the node must exit with an error. | ||
let Some(last_archived_block) = client.block(last_archived_block_hash)? else { | ||
let mapping = if create_object_mappings { | ||
"mapping " | ||
} else { | ||
"" | ||
}; | ||
let error = format!( | ||
"Missing data for {mapping}block {last_archived_block_number} hash {last_archived_block_hash}" | ||
); | ||
return Err(sp_blockchain::Error::Application(error.into())); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t possible, because find_last_archived_block searches back through the segments until it finds a segment with a last archived block which is less than or equal to best_block_to_archive.
initialize_archiver doesn’t know about this logic, so it can’t check the actual last_archived_block_number found by find_last_archived_block.
// There aren't any mappings in the genesis block, so starting there is pointless. | ||
// (And causes errors on restart, because genesis block data is pruned.) | ||
best_block_to_archive = best_block_to_archive.min(block_number).max(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get a zero best_block_to_archive because we do a saturating subtraction on the best block, so we need to keep the max(1). (PR #3247 doesn’t fix that, it only stops the config being zero.)
Replaces #3241.
If we can’t create mappings for the block chosen on the command line, we need to exit with an error.
Choosing the genesis block is useless because it has no mappings, and we often don’t have data for it in our store. So this PR changes mapping creation at the genesis block to start at block 1 instead.
Code contributor checklist: