Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix approval-voting canonicalize off by one #6864

Merged
merged 4 commits into from
Dec 13, 2024
Merged

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Dec 12, 2024

Approval voting canonicalize is off by one that means if we are finalizing blocks one by one, approval-voting cleans it up every other block for example:

  • With 1, 2, 3, 4, 5, 6 blocks created, the stored range would be StoredBlockRange(1,7)
  • When block 3 is finalized the canonicalize works and StoredBlockRange is (4,7)
  • When block 4 is finalized the canonicalize exists early because of the if range.0 > canon_number break clause, so blocks are not cleaned up.
  • When block 5 is finalized the canonicalize works and StoredBlockRange becomes (6,7) and both block 4 and 5 are cleaned up.

The consequences of this is that sometimes we keep block entries around after they are finalized, so at restart we consider this blocks and send them to approval-distribution.

In most cases this is not a problem, but in the case when finality is lagging on restart approval-distribution will receive 4 as being the oldest block it needs to work on, and since BlockFinalized is never resent for block 4 after restart it won't get the opportunity to clean that up. Therefore it will end running approval-distribution aggression on block 4, because that is the oldest block it received from approval-voting for which it did not see a BlockFinalized signal.

Approval voting canonicalize is off by one that means if we are
finalizing blocks one by one, approval-voting cleans it up every other
block because for example:

- With 1, 2, 3, 4, 5, 6 blocks created, the stored range would be StoredBlockRange(1,7)
- When block 3 is finalized the canonicalize works and StoredBlockRange
  is (4,7)
- When block 4 is finalized the canonicalize exists early because of the
  `if range.0 > canon_number` break clause, so blocks are not cleaned
up.
- When block 5 is finalized the canonicalize works and StoredBlockRange
  becomes (6,7).

The consequences of this is that sometimes we keep block entries around
after they are finalized, so at restart we consider this blocks and send
them to approval-distribution. In this case if finality is lagging for
example at block 5 after approval-distribution receives 4 as being the
oldest block it needs to work on, and since BlockFinalized is never
resent for block 4, that is the block on which it will decide to
finalize things.

Signed-off-by: Alexandru Gheorghe <[email protected]>
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@alexggh alexggh added T8-polkadot This PR/Issue is related to/affects the Polkadot network. A4-needs-backport Pull request must be backported to all maintained releases. labels Dec 13, 2024
@alexggh alexggh enabled auto-merge December 13, 2024 10:08
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12313266650
Failed job name: test-linux-stable

@alexggh alexggh added this pull request to the merge queue Dec 13, 2024
Merged via the queue into master with commit 2dd2bb5 Dec 13, 2024
198 of 200 checks passed
@alexggh alexggh deleted the alexaggh/fix_canonicalize branch December 13, 2024 13:06
github-actions bot pushed a commit that referenced this pull request Dec 13, 2024
Approval voting canonicalize is off by one that means if we are
finalizing blocks one by one, approval-voting cleans it up every other
block for example:

- With 1, 2, 3, 4, 5, 6 blocks created, the stored range would be
StoredBlockRange(1,7)
- When block 3 is finalized the canonicalize works and StoredBlockRange
is (4,7)
- When block 4 is finalized the canonicalize exists early because of the
`if range.0 > canon_number` break clause, so blocks are not cleaned up.
- When block 5 is finalized the canonicalize works and StoredBlockRange
becomes (6,7) and both block 4 and 5 are cleaned up.

The consequences of this is that sometimes we keep block entries around
after they are finalized, so at restart we consider this blocks and send
them to approval-distribution.

In most cases this is not a problem, but in the case when finality is
lagging on restart approval-distribution will receive 4 as being the
oldest block it needs to work on, and since BlockFinalized is never
resent for block 4 after restart it won't get the opportunity to clean
that up. Therefore it will end running approval-distribution aggression
on block 4, because that is the oldest block it received from
approval-voting for which it did not see a BlockFinalized signal.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
(cherry picked from commit 2dd2bb5)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2407:

github-actions bot pushed a commit that referenced this pull request Dec 13, 2024
Approval voting canonicalize is off by one that means if we are
finalizing blocks one by one, approval-voting cleans it up every other
block for example:

- With 1, 2, 3, 4, 5, 6 blocks created, the stored range would be
StoredBlockRange(1,7)
- When block 3 is finalized the canonicalize works and StoredBlockRange
is (4,7)
- When block 4 is finalized the canonicalize exists early because of the
`if range.0 > canon_number` break clause, so blocks are not cleaned up.
- When block 5 is finalized the canonicalize works and StoredBlockRange
becomes (6,7) and both block 4 and 5 are cleaned up.

The consequences of this is that sometimes we keep block entries around
after they are finalized, so at restart we consider this blocks and send
them to approval-distribution.

In most cases this is not a problem, but in the case when finality is
lagging on restart approval-distribution will receive 4 as being the
oldest block it needs to work on, and since BlockFinalized is never
resent for block 4 after restart it won't get the opportunity to clean
that up. Therefore it will end running approval-distribution aggression
on block 4, because that is the oldest block it received from
approval-voting for which it did not see a BlockFinalized signal.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
(cherry picked from commit 2dd2bb5)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2409:

github-actions bot pushed a commit that referenced this pull request Dec 13, 2024
Approval voting canonicalize is off by one that means if we are
finalizing blocks one by one, approval-voting cleans it up every other
block for example:

- With 1, 2, 3, 4, 5, 6 blocks created, the stored range would be
StoredBlockRange(1,7)
- When block 3 is finalized the canonicalize works and StoredBlockRange
is (4,7)
- When block 4 is finalized the canonicalize exists early because of the
`if range.0 > canon_number` break clause, so blocks are not cleaned up.
- When block 5 is finalized the canonicalize works and StoredBlockRange
becomes (6,7) and both block 4 and 5 are cleaned up.

The consequences of this is that sometimes we keep block entries around
after they are finalized, so at restart we consider this blocks and send
them to approval-distribution.

In most cases this is not a problem, but in the case when finality is
lagging on restart approval-distribution will receive 4 as being the
oldest block it needs to work on, and since BlockFinalized is never
resent for block 4 after restart it won't get the opportunity to clean
that up. Therefore it will end running approval-distribution aggression
on block 4, because that is the oldest block it received from
approval-voting for which it did not see a BlockFinalized signal.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
(cherry picked from commit 2dd2bb5)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2412:

EgorPopelyaev pushed a commit that referenced this pull request Dec 13, 2024
Backport #6864 into `stable2412` from alexggh.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Alexandru Gheorghe <[email protected]>
EgorPopelyaev pushed a commit that referenced this pull request Dec 17, 2024
Backport #6864 into `stable2409` from alexggh.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Alexandru Gheorghe <[email protected]>
EgorPopelyaev pushed a commit that referenced this pull request Dec 18, 2024
Backport #6864 into `stable2407` from alexggh.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Alexandru Gheorghe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants