Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit with an error if mappings can’t be created for the chosen block #3242

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

teor2345
Copy link
Contributor

Replaces #3241.

If we can’t create mappings for the block chosen on the command line, we need to exit with an error.

Choosing the genesis block is useless because it has no mappings, and we often don’t have data for it in our store. So this PR changes mapping creation at the genesis block to start at block 1 instead.

Code contributor checklist:

Comment on lines 449 to 462
// If the user chooses an object mapping start block we don't have the data for, we can't
// create mappings for it, so the node must exit with an error.
let Some(last_archived_block) = client.block(last_archived_block_hash)? else {
let mapping = if create_object_mappings {
"mapping "
} else {
""
};
let error = format!(
"Missing data for {mapping}block {last_archived_block_number} hash {last_archived_block_hash}"
);
return Err(sp_blockchain::Error::Application(error.into()));
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about checking this in initialize_archiver instead? It should be sufficient to query best_block_to_archive there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t possible, because find_last_archived_block searches back through the segments until it finds a segment with a last archived block which is less than or equal to best_block_to_archive.

initialize_archiver doesn’t know about this logic, so it can’t check the actual last_archived_block_number found by find_last_archived_block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Achiving has the invariant that from the tip of the chain (best block) we can traverse back to some block that allows us to initialize archiver. By overriding best_block_to_archive with block for which to create object mappings we constrain where to start.

Looking carefully we can see (best_block_to_archive..best_block_number) iteration that ensures to step up closer to the tip to find the blocks that actually exist (right now tip-100 may not actually exist in the database if we use Snap sync and with future Snap sync we may have holes in the database).

One edge case that was problematic before was genesis block, which you handled in #3247, but that was actually not necessary strictly speaking since it is a special case of a more general problem. The same exact issue can happen if we prune blocks since Substrate still never prunes headers (Shamil is working on fixing this in paritytech/polkadot-sdk#6451), so quick check for block existence using header in above mentioned loop will succeed, while block body is already gone.

So to handle both genesis block case after Snap sync and this, all you need to do is to add additional check after the loop that block body exists. Since the only reason for it to not exist is incorrect user input for object mapping creation CLI option, we can return a corresponding error back. In all other cases node sync maintains mentioned invariants already, hence no corresponding error handling in find_last_archived_block and why I'm insisting no changes need to be done there.

Moving override of best_block_to_archive to after the loop breaks these invariants and simply moves its handling to a different place unnecessarily.

Comment on lines 623 to 625
// There aren't any mappings in the genesis block, so starting there is pointless.
// (And causes errors on restart, because genesis block data is pruned.)
best_block_to_archive = best_block_to_archive.min(block_number).max(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will work fine with --sync full, maybe make the input NonZero if zero doesn't make sense as a value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it’s kind of a weird case. There’s nothing wrong with having mappings in the genesis block, it’s just something we choose not to do.

I’m also not sure if we can get a zero best_block_to_archive here, I’ll check the code again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-zero change is now PR #3247

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get a zero best_block_to_archive because we do a saturating subtraction on the best block, so we need to keep the max(1). (PR #3247 doesn’t fix that, it only stops the config being zero.)

@teor2345
Copy link
Contributor Author

I’ve been working through the different cases here, it’s a bit trickier than I thought.

Ideally, if we’re doing what the user said on the command line, we should handle it this way

  • starting at block 0: just initialise an empty archiver like we would if there was no cached state
  • starting anywhere in segment 0: empty archiver again
    • but how can we tell the segment from the block number? Look up segment 1 and compare with the last block number in the parent segment header? (and if there’s no segment 1, just start empty)

In those two cases, we’d never get to the new error case in find_last_archived_block(), because the empty archiver case follows a different code path.

And a block number in any other segment should just work, unless the block data has been pruned. In that case, we can either:

  1. return the new error added in this PR, and let the user decide what to do
  2. start with an empty archiver (which could cost a lot of network and CPU)

I’m not sure if we can do the check for the new error outside find_last_archived_block(), because we’re not starting with the data of the exact block number the user asked for on the command line. We’re starting with the last block in the last segment before that (and a few other conditions).

@nazar-pc
Copy link
Member

starting at block 0: just initialise an empty archiver like we would if there was no cached state

This is already happening, the only exception if node is started with default --sync snap instead of --sync full, in which case genesis state is missing as described in paritytech/polkadot-sdk#5366

starting anywhere in segment 0: empty archiver again

The same as above, it should already work correctly with --sync full, I don't think we need to change anything.

start with an empty archiver (which could cost a lot of network and CPU)

We certainly don't want to do any networking for the purposes of returning object mappings, just return an error. The only exception is if we have a fresh node, then we can take advantage of target block support that Shamil introduced for domain snap sync, but I wouldn't go there unless we have an explicit request for such a feature.

I’m not sure if we can do the check for the new error outside find_last_archived_block(), because we’re not starting with the data of the exact block number the user asked for on the command line. We’re starting with the last block in the last segment before that (and a few other conditions).

best_block_to_archive is an argument of find_last_archived_block, I see no reason why we can't check it before calling find_last_archived_block unless I'm missing something.

Copy link
Contributor Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s where I got to today after checking the code in detail.

Comment on lines 449 to 462
// If the user chooses an object mapping start block we don't have the data for, we can't
// create mappings for it, so the node must exit with an error.
let Some(last_archived_block) = client.block(last_archived_block_hash)? else {
let mapping = if create_object_mappings {
"mapping "
} else {
""
};
let error = format!(
"Missing data for {mapping}block {last_archived_block_number} hash {last_archived_block_hash}"
);
return Err(sp_blockchain::Error::Application(error.into()));
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t possible, because find_last_archived_block searches back through the segments until it finds a segment with a last archived block which is less than or equal to best_block_to_archive.

initialize_archiver doesn’t know about this logic, so it can’t check the actual last_archived_block_number found by find_last_archived_block.

Comment on lines 623 to 625
// There aren't any mappings in the genesis block, so starting there is pointless.
// (And causes errors on restart, because genesis block data is pruned.)
best_block_to_archive = best_block_to_archive.min(block_number).max(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get a zero best_block_to_archive because we do a saturating subtraction on the best block, so we need to keep the max(1). (PR #3247 doesn’t fix that, it only stops the config being zero.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants