Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectivity issues on a kusama validator #6834

Closed
tdimitrov opened this issue Dec 10, 2024 · 4 comments
Closed

Connectivity issues on a kusama validator #6834

tdimitrov opened this issue Dec 10, 2024 · 4 comments
Assignees

Comments

@tdimitrov
Copy link
Contributor

I was on call and I was paged for a high warning rate of a Kusama validator (kusama-validator-bhs5-0).

The log I initially saw was:

2024-12-10 15:28:23.035  WARN tokio-runtime-worker parachain::availability-recovery: Recovery of available data failed. candidate_hash=0xfdcbf1ec1c07ec09c5b7fa28d65d8e8fcc58483951ddcdc54592450cd7d92557 traceID=337353625963102276686703469597376220815

Looking further I also noticed:

 WARN tokio-runtime-worker parachain::availability-distribution: Some network error occurred when fetching erasure chunk origin=Public(34ae6b5f58f73ab9cd35fdd428cd10bb307ba76611ceca0d6f4c05b1be717d03 (DmPpai2j...)) relay_parent=0x0e784c18c498c1ad1cae22c164cb530b042c068a9eb124fddff14702e362f92a group_index=GroupIndex(79) session_index=44133 chunk_index=ValidatorIndex(325) candidate_hash=0x099ce3e573d068c4df41734ebfb3bc24fecdb6aaac51a73170bafb26a897da7b err=Network(DialFailure) traceID=12777672558067648119602414705356028964

There were also a lot of low connectivity warnings:

2024-12-10 16:51:36.674  WARN tokio-runtime-worker parachain::gossip-support: Low connectivity - authority lookup failed for too many validators. connected=0 target=1000
@tdimitrov
Copy link
Contributor Author

Related litep2p issue: paritytech/litep2p#300

@lexnv lexnv self-assigned this Dec 10, 2024
@lexnv
Copy link
Contributor

lexnv commented Dec 10, 2024

The validator is running for aprox 14 days a debug version of litep2p built from:

I expect the root cause of the networking errors to be related to

tokio-runtime-worker litep2p::tcp::connection: failed to register opened substream to protocol protocol=Allocated("/b0a8d493285c2df73290dfb7e61f870f17b41801197a149ca93654499ea3dafe/kad") peer=PeerId("12D3KooWSKhrnPPYmAAfpA8TzE6NLkDooMzm7bApLYk86TNBmfcp") endpoint=Listener { address: "/ip6/2a02:1210:821b:7f00:ce28:aaff:fe0f:2762/tcp/37594", connection_id: ConnectionId(31845931) } error=ConnectionClosed

The node presents another unexpected error from the MDNS component, which manifests as Network Unreachable, however that is strange considering 224.0.0.251 multicast address should be available:

tokio-runtime-worker litep2p::mdns: failed to send mdns query error=IoError(NetworkUnreachable)

@lexnv lexnv added this to Networking Dec 10, 2024
@lexnv
Copy link
Contributor

lexnv commented Dec 10, 2024

Node Triage

Repo            | Count      | Level      | Triage report

https://github.com/paritytech/polkadot-sdk/ | 35638      | warn_if_frequent | Some network error occurred when fetching erasure chunk
https://github.com/paritytech/litep2p/ | 18736      | error      | failed to register opened substream to protocol
https://github.com/paritytech/polkadot-sdk/ | 475        | warn       | Data unavailable for candidate .*
https://github.com/paritytech/polkadot-sdk/ | 475        | warn       | Recovery of available data failed.
https://github.com/paritytech/polkadot-sdk/ | 9          | warn       | fetch_pov_job
https://github.com/paritytech/polkadot-sdk/ | 9          | warn       | Cluster has too many pending statements, something wrong with our connection to our group peers
https://github.com/paritytech/polkadot-sdk/ | 3          | warn       | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( Same block request multiple times. Banned, disconnecting.)
https://github.com/paritytech/polkadot-sdk/ | 2          | warn       | Report .*: .* to .*. Reason: .*. Banned, disconnecting. ( A collator provided a collation for the wrong para. Banned, disconnecting.)
https://github.com/paritytech/litep2p/ | 1          | error      | failed to register substream open failure to protocol
https://github.com/paritytech/polkadot-sdk/ | 1          | warn       | .*: .* is already a reserved peer
https://github.com/paritytech/litep2p/ | 1          | error      | failed to send mdns query

@lexnv
Copy link
Contributor

lexnv commented Dec 13, 2024

Closed by: #6860

See the following litep2p PR for more details:

  • mdns/fix: Failed to register opened substream (#301)

@lexnv lexnv closed this as completed Dec 13, 2024
@github-project-automation github-project-automation bot moved this to Blocked ⛔️ in Networking Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Blocked ⛔️
Development

No branches or pull requests

2 participants