Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Potential temporary sync stall from weak-link or malicious peers #3322

Open
Meshiest opened this issue Jun 21, 2024 · 2 comments
Open
Assignees
Labels
bug Incorrect or unexpected behavior

Comments

@Meshiest
Copy link
Contributor

Meshiest commented Jun 21, 2024

🐛 Bug Report

A circumstance where block requests are forced to timeout can be reached when a peer with a pending request does not respond with a block response. This is similar to the temporary stall outlined in #3321.

Steps to Reproduce

  1. prevent a block response from being created from the validator or the client's router
  2. observe syncing nodes using the modified nodes as peers get stuck for ~10 minutes when reaching a block with an "incomplete request"

We encountered a 10 minute stall during a sync of a node on high performance hardware after merging the latest client sync fixes and deduced it may be related to a swarm of low-quality clients as connected peers (from another test). We were able to resume block sync immediately by forcibly disconnecting the weak links.

image

The 34.16.96.117 IP is one of 20 16core servers under extreme stress from 10 clients syncing blocks with dozens of transactions per block and as a result is unreliably able to respond to block requests, preventing a block request from being marked as complete.

image

Expected Behavior

Blocks to sync without stalls when a peer unable to respond.

Potential Fixes

  • timeout for block requests on a peer-node basis
  • disconnect from peers that are unreliable
  • allow for a certain # of peers to be unresponsive when marking a request as complete in block_sync.rs
  • shorter total block request timeouts

Your Environment

@Meshiest Meshiest added the bug Incorrect or unexpected behavior label Jun 21, 2024
@elderhammer
Copy link
Contributor

#3320
As I observed a few days ago, I thought this issue can be ignored because the team did not comment on it. Now I reopened this issue

@raychu86
Copy link
Contributor

Should be addressed with #3422. (This PR bans peers who frequently reach timeouts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect or unexpected behavior
Projects
None yet
Development

No branches or pull requests

3 participants