Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(network): Reconnect with peers after brief network interruption #7853

Merged
merged 2 commits into from
Oct 27, 2023

Conversation

arya2
Copy link
Contributor

@arya2 arya2 commented Oct 26, 2023

Motivation

Zebra currently will not reconnect after an internet connection failure.

Part of #7772.

Solution

  • Always attempt to make an outbound connection in TimerCrawl

Review

Anyone can review.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

Follow Up Work

  • Investigate why it isn't sending a MorePeers message in PeerSet::poll_ready()
  • Fix that and revert this PR

@arya2 arya2 added C-bug Category: This is a bug P-Medium ⚡ I-hang A Zebra component stops responding to requests I-usability Zebra is hard to understand or use A-network Area: Network protocol updates or fixes labels Oct 26, 2023
@arya2 arya2 self-assigned this Oct 26, 2023
@arya2 arya2 requested a review from a team as a code owner October 26, 2023 22:41
@arya2 arya2 requested review from teor2345 and removed request for a team October 26, 2023 22:41
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix might work, but there are no new tests, and no comments about manual tests. So I can't be sure. It also has some network load drawbacks that we should fix.

There are other bugs we might need to follow up, but some haven't been fully analysed, and they don't have tickets yet. Can you work out if we need to add logs to diagnose them? This PR would be a good place to add those logs.

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We might want to raise the minimum number of peers later, but 1 is an acceptable minimum for now.

mergify bot added a commit that referenced this pull request Oct 27, 2023
@mergify mergify bot merged commit 5367ccb into main Oct 27, 2023
104 checks passed
@mergify mergify bot deleted the fix-reconnect-bug branch October 27, 2023 06:13
@mpguerra mpguerra linked an issue Oct 27, 2023 that may be closed by this pull request
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-usability Zebra is hard to understand or use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: zebrad will not reconnect after an internet connection failure and restore
2 participants