Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stable2412] Backport #6729 #6830

Merged
merged 1 commit into from
Dec 11, 2024
Merged

Conversation

paritytech-cmd-bot-polkadot-sdk[bot]

Backport #6729 into stable2412 from alexggh.

See the documentation on how to use this bot.

The way we build the messages we need to send to approval-distribution
can result in a situation where is we have multiple assignments covered
by a coalesced approval, the messages are sent in this order:

ASSIGNMENT1, APPROVAL, ASSIGNMENT2, because we iterate over each
candidate and add to the queue of messages both the assignment and the
approval for that candidate, and when the approval reaches the
approval-distribution subsystem it won't be imported and gossiped
because one of the assignment for it is not known.

So in a network where a lot of nodes are restarting at the same time we
could end up in a situation where a set of the nodes correctly received
the assignments and approvals before the restart and approve their
blocks and don't trigger their assignments. The other set of nodes
should receive the assignments and approvals after the restart, but
because the approvals never get broacasted anymore because of this bug,
the only way they could approve is if other nodes start broadcasting
their assignments.

I think this bug contribute to the reason the network did not recovered
on `25-11-25 15:55:40` after the restarts.

Tested this scenario with a `zombienet` where `nodes` are finalising
blocks because of aggression and all nodes are restarted at once and
confirmed the network lags and doesn't recover before and it does after
the fix

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
(cherry picked from commit 65a4e5e)
@github-actions github-actions bot added the A3-backport Pull request is already reviewed well in another branch. label Dec 10, 2024
Copy link

This pull request is amending an existing release. Please proceed with extreme caution,
as to not impact downstream teams that rely on the stability of it. Some things to consider:

  • Backports are only for 'patch' or 'minor' changes. No 'major' or other breaking change.
  • Should be a legit fix for some bug, not adding tons of new features.
  • Must either be already audited or not need an audit.
Emergency Bypass

If you really need to bypass this check: add validate: false to each crate
in the Prdoc where a breaking change is introduced. This will release a new major
version of that crate and all its reverse dependencies and basically break the release.

@EgorPopelyaev EgorPopelyaev merged commit 2f20a64 into stable2412 Dec 11, 2024
195 of 201 checks passed
@EgorPopelyaev EgorPopelyaev deleted the backport-6729-to-stable2412 branch December 11, 2024 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A3-backport Pull request is already reviewed well in another branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants