-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change forks pruning algorithm. #3962
Change forks pruning algorithm. #3962
Conversation
- Prune all possible forks after block finalizing without height limit.
This pull request has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/block-header-pruning/7198/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the logic is correct, it is much more costly doing this than doing the "lazy" pruning as before. If we would record the block at which a certain leaf forked off, we should probably be able to achieve the same without iterating always over all the headers from the leaf down to the finalized block?
Co-authored-by: Bastian Köcher <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
Summary: |
This pull request has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/block-header-pruning/7198/2 |
Yes, I mean I would assume the same as well. I think the best is to ask @arkpar on why he hasn't done it this way from the beginning. There is maybe something we oversee, otherwise I'm also fine with the approach you are proposing here. |
This was written by someone else, but I guess it was done for simplicity, as the quoted comment explains. As for this PR, it looks good for me. I wonder if it can be optimized though. |
I implemented the suggestions from @arkpar The code seems simpler now. |
`finalize_height` method doesn’t exist. It was used to determine the forks and that algorithm changed.
The last commit removes the tests related to the removed method: |
The last commits fix tests by updating the expected block where stale heads appear. However, I also changed |
8fdcc7f
to
57816e9
Compare
@arkpar Do you mind looking at the |
When you call |
@@ -479,35 +436,4 @@ mod tests { | |||
assert!(set.contains(10, 1_2)); | |||
assert!(!set.contains(10, 1_3)); | |||
} | |||
|
|||
#[test] | |||
fn finalization_consistent_with_disk() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalize_height
was removed from LeafSet. Please, let me know if you see how this test should be reimplemented differently.
match tree_route(self, *fork_head, self.info().finalized_hash) { | ||
Ok(tree_route) => { | ||
for block in tree_route.retracted() { | ||
expanded_forks.insert(block.hash); | ||
} | ||
continue | ||
}, | ||
Err(_) => { | ||
// Continue with fallback algorithm | ||
}, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match tree_route(self, *fork_head, self.info().finalized_hash) { | |
Ok(tree_route) => { | |
for block in tree_route.retracted() { | |
expanded_forks.insert(block.hash); | |
} | |
continue | |
}, | |
Err(_) => { | |
// Continue with fallback algorithm | |
}, | |
} |
See my comment in the pr.
Co-authored-by: Bastian Köcher <[email protected]>
Sure. I removed the old code and updated the function comment as well as its dependencies. I also updated one of the tests to counter its flaky behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@shamil-gadelshin the rustdoc jobs are still failing. |
Head branch was pushed to by a user without write access
9c69bb9
This PR changes the fork calculation and pruning algorithm to enable future block header pruning. It's required because the previous algorithm relied on the block header persistence. It follows the [related discussion](paritytech#1570) The previous code contained this comment describing the situation: ``` /// Note a block height finalized, displacing all leaves with number less than the finalized /// block's. /// /// Although it would be more technically correct to also prune out leaves at the /// same number as the finalized block, but with different hashes, the current behavior /// is simpler and our assumptions about how finalization works means that those leaves /// will be pruned soon afterwards anyway. pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> { ``` The previous algorithm relied on the existing block headers to prune forks later and to enable block header pruning we need to clear all obsolete forks right after the block finalization to not depend on the related block headers in the future. --------- Co-authored-by: Bastian Köcher <[email protected]>
## Issue Currently, syncing parachains from scratch can lead to a very long finalization time once they reach the tip of the chain. The problem is that we try to finalize everything from 0 to the tip, which can be thousands or even millions of blocks. We finalize sequentially and try to compute displaced branches during finalization. So for every block on the way, we compute an expensive tree route. ## Proposed Improvements In this PR, I propose improvements that solve this situation: - **Skip tree route calculation if `leaves().len() == 1`:** This should be enough for 90% of cases where there is only one leaf after sync. - **Optimize finalization for long distances:** It can happen that the parachain has imported some leaf and then receives a relay chain notification with the finalized block. In that case, the previous optimization will not trigger. A second mechanism should ensure that we do not need to compute the full tree route. If the finalization distance is long, we check the lowest common ancestor of all the leaves. If it is above the to-be-finalized block, we know that there are no displaced leaves. This is fast because forks are short and close to the tip, so we can leverage the header cache. ## Alternative Approach - The problem was introduced in #3962. Reverting that PR is another possible strategy. - We could store for every fork where it begins, however sounds a bit more involved to me. fixes #4614
## Issue Currently, syncing parachains from scratch can lead to a very long finalization time once they reach the tip of the chain. The problem is that we try to finalize everything from 0 to the tip, which can be thousands or even millions of blocks. We finalize sequentially and try to compute displaced branches during finalization. So for every block on the way, we compute an expensive tree route. ## Proposed Improvements In this PR, I propose improvements that solve this situation: - **Skip tree route calculation if `leaves().len() == 1`:** This should be enough for 90% of cases where there is only one leaf after sync. - **Optimize finalization for long distances:** It can happen that the parachain has imported some leaf and then receives a relay chain notification with the finalized block. In that case, the previous optimization will not trigger. A second mechanism should ensure that we do not need to compute the full tree route. If the finalization distance is long, we check the lowest common ancestor of all the leaves. If it is above the to-be-finalized block, we know that there are no displaced leaves. This is fast because forks are short and close to the tip, so we can leverage the header cache. ## Alternative Approach - The problem was introduced in #3962. Reverting that PR is another possible strategy. - We could store for every fork where it begins, however sounds a bit more involved to me. fixes #4614
This PR changes the fork calculation and pruning algorithm to enable future block header pruning. It's required because the previous algorithm relied on the block header persistence. It follows the [related discussion](paritytech#1570) The previous code contained this comment describing the situation: ``` /// Note a block height finalized, displacing all leaves with number less than the finalized /// block's. /// /// Although it would be more technically correct to also prune out leaves at the /// same number as the finalized block, but with different hashes, the current behavior /// is simpler and our assumptions about how finalization works means that those leaves /// will be pruned soon afterwards anyway. pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> { ``` The previous algorithm relied on the existing block headers to prune forks later and to enable block header pruning we need to clear all obsolete forks right after the block finalization to not depend on the related block headers in the future. --------- Co-authored-by: Bastian Köcher <[email protected]>
This PR changes the fork calculation and pruning algorithm to enable future block header pruning. It's required because the previous algorithm relied on the block header persistence. It follows the [related discussion](paritytech#1570) The previous code contained this comment describing the situation: ``` /// Note a block height finalized, displacing all leaves with number less than the finalized /// block's. /// /// Although it would be more technically correct to also prune out leaves at the /// same number as the finalized block, but with different hashes, the current behavior /// is simpler and our assumptions about how finalization works means that those leaves /// will be pruned soon afterwards anyway. pub fn finalize_height(&mut self, number: N) -> FinalizationOutcome<H, N> { ``` The previous algorithm relied on the existing block headers to prune forks later and to enable block header pruning we need to clear all obsolete forks right after the block finalization to not depend on the related block headers in the future. --------- Co-authored-by: Bastian Köcher <[email protected]>
…tech#4721) ## Issue Currently, syncing parachains from scratch can lead to a very long finalization time once they reach the tip of the chain. The problem is that we try to finalize everything from 0 to the tip, which can be thousands or even millions of blocks. We finalize sequentially and try to compute displaced branches during finalization. So for every block on the way, we compute an expensive tree route. ## Proposed Improvements In this PR, I propose improvements that solve this situation: - **Skip tree route calculation if `leaves().len() == 1`:** This should be enough for 90% of cases where there is only one leaf after sync. - **Optimize finalization for long distances:** It can happen that the parachain has imported some leaf and then receives a relay chain notification with the finalized block. In that case, the previous optimization will not trigger. A second mechanism should ensure that we do not need to compute the full tree route. If the finalization distance is long, we check the lowest common ancestor of all the leaves. If it is above the to-be-finalized block, we know that there are no displaced leaves. This is fast because forks are short and close to the tip, so we can leverage the header cache. ## Alternative Approach - The problem was introduced in paritytech#3962. Reverting that PR is another possible strategy. - We could store for every fork where it begins, however sounds a bit more involved to me. fixes paritytech#4614
…tech#4721) ## Issue Currently, syncing parachains from scratch can lead to a very long finalization time once they reach the tip of the chain. The problem is that we try to finalize everything from 0 to the tip, which can be thousands or even millions of blocks. We finalize sequentially and try to compute displaced branches during finalization. So for every block on the way, we compute an expensive tree route. ## Proposed Improvements In this PR, I propose improvements that solve this situation: - **Skip tree route calculation if `leaves().len() == 1`:** This should be enough for 90% of cases where there is only one leaf after sync. - **Optimize finalization for long distances:** It can happen that the parachain has imported some leaf and then receives a relay chain notification with the finalized block. In that case, the previous optimization will not trigger. A second mechanism should ensure that we do not need to compute the full tree route. If the finalization distance is long, we check the lowest common ancestor of all the leaves. If it is above the to-be-finalized block, we know that there are no displaced leaves. This is fast because forks are short and close to the tip, so we can leverage the header cache. ## Alternative Approach - The problem was introduced in paritytech#3962. Reverting that PR is another possible strategy. - We could store for every fork where it begins, however sounds a bit more involved to me. fixes paritytech#4614
This PR changes the fork calculation and pruning algorithm to enable future block header pruning. It's required because the previous algorithm relied on the block header persistence. It follows the related discussion
The previous code contained this comment describing the situation:
The previous algorithm relied on the existing block headers to prune forks later and to enable block header pruning we need to clear all obsolete forks right after the block finalization to not depend on the related block headers in the future.