Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"An unneeded collator connected" when parachain runs out of core time #6733

Open
2 tasks done
tmpolaczyk opened this issue Dec 2, 2024 · 6 comments
Open
2 tasks done
Assignees
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@tmpolaczyk
Copy link
Contributor

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

Related: #616

I believe currently collators don't properly disconnect from validators, because I see these logs in the validators when a parachain runs out of core time:

2024-11-27 15:04:30.422 DEBUG tokio-runtime-worker parachain::network-bridge-rx: action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv") role=Full
2024-11-27 15:04:30.423 DEBUG tokio-runtime-worker parachain::collator-protocol: Declared as collator for unneeded para. Current assignments: {} peer_id=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv") collator_id=Public(8e6e0feedba7494a19662e3178bc66b6801716ee4c12e304c78fde02cc96941c (14DkVhzA...)) para_id=Id(2001)
2024-11-27 15:04:30.423 DEBUG tokio-runtime-worker parachain::reputation-aggregator: Reduce reputation peer=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv") rep=CostMinor("An unneeded collator connected")
2024-11-27 15:04:30.423 DEBUG tokio-runtime-worker parachain::network-bridge-rx: action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv")
2024-11-27 15:04:30.549 DEBUG tokio-runtime-worker parachain::network-bridge-rx: action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWPjcG7TYtZfkoyiTK1esowizuP48uff7RNZHR5BraMXqH") role=Full

It stays like this forever, with the same peer trying to connect and getting banned exactly every 1 second. Any advice on how to fix it? Although since it keeps connecting forever, the collator doesn't actually get banned so this is not a problem? Not sure.

Steps to reproduce

Start a local testnet and remove the core assignment from the para

@tmpolaczyk tmpolaczyk added I10-unconfirmed Issue might be valid, but it's not yet known. I2-bug The node fails to follow expected behavior. labels Dec 2, 2024
@tdimitrov
Copy link
Contributor

Hey @tmpolaczyk
Sorry for the late reply! The reason most probably is that the collator doesn't use the claim queue to determine its assignments but blindly produces collations and gets disconnected. Can you tell me which binary were you using for the tests?

I think polkadot-parachain collator should handle this correctly.

@tmpolaczyk
Copy link
Contributor Author

Hi @tdimitrov . The setup is a bit complex, this is our collator code:

https://github.com/moondance-labs/tanssi/blob/57704bf0cff1229a2c88aed1652b27d482f666c1/client/consensus/src/collators/lookahead.rs#L506

If you are willing to compile, it can be reproduced using pnpm moonwall test zombie_tanssi_relay_unneeded_para.

Can you show me which part of the code handles the unassignment? Maybe we missed that.

@tdimitrov
Copy link
Contributor

Unfortunately I'm not an expert on the collator code and can't tell exactly where this is handled.

@skunert can you help? Does the lookahead collator disconnect from the validator if its para id is not assigned on any core?

@skunert
Copy link
Contributor

skunert commented Dec 13, 2024

Took a look. It looks like you are running a modified version of the lookahead. Which version of polkadot-sdk are you using?

Is the collator still producing blocks even without coretime? Looking at the code, you are indeed checking for cores here and if no cores available, should produce no blocks.

From the earlier description, it sounds like to me that the collator is not producing blocks but still connecting and disconnecting all the time from validators.

@tmpolaczyk
Copy link
Contributor Author

We are using stable2409: https://github.com/moondance-labs/polkadot-sdk/tree/tanssi-polkadot-stable2409

Indeed the collator is not trying to produce blocks, so that part is correct, but it is still connecting to the validators.

@tdimitrov
Copy link
Contributor

From the earlier description, it sounds like to me that the collator is not producing blocks but still connecting and disconnecting all the time from validators.

That's correct. Disconnecting from validators depend on the type of the collator being used but on the collator protocol implementation. It seems that the collator disconnects from all validators at the point where there is nothing to advertise to them (here but I don't understand what's going wrong in your case.

I need to investigate this further.

@tdimitrov tdimitrov self-assigned this Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

No branches or pull requests

3 participants