[5.0.3] P2P: Resolve on reconnect #2408

heifner · 2024-09-25T21:51:45Z

Resolve address on reconnect.

Instead of:

info  2024-09-25T21:50:50.686 net-1     net_plugin.cpp:2797           operator()           ] connection failed to p2p.telos-bad.zenblocks.io:9876, Element not found
info  2024-09-25T21:50:50.686 net-1     net_plugin.cpp:1461           _close               ] ["p2p.telos-bad.zenblocks.io:9876" - 1 <unknown>:<unknown>] closing

Now you get:

warn  2024-09-25T21:42:48.642 net-0     net_plugin.cpp:4524           operator()           ] Unable to resolve p2p.telos-bad.zenblocks.io:9876 Host not found (authoritative)

plugins/net_plugin/net_plugin.cpp

linh2931

Can a test be added?

spoonincode · 2024-09-25T22:52:23Z

plugins/net_plugin/net_plugin.cpp

+                  fc_wlog( logger, "Unable to resolve ${host}:${port} ${error}",
+                           ("host", host)("port", port)( "error", err.message() ) );
+                  c->set_state(connection::connection_state::closed);
+                  ++(c->consecutive_immediate_connection_close);


What does increasing this do? I thought it was going to cause some sort of cool down when reaching def_max_consecutive_immediate_connection_close but doesn't seem to matter

Need to look into it, but I think the change that broke this re-connect also broke this back off of consecutive_immediate_connection_close

Hmm, actually this a reason to reuse the connection object. I guess I should maintain the reuse of the connection object for this.

It seems like now (on 51fd8fa) after failing to resolve a host def_max_consecutive_immediate_connection_close times it just never tries it again. But if the resolve completes and simply unable to connect to the remote host, it will try retry indefinitely. I can't tell if this discrepancy is intentional or not. It makes me wonder if we should not be increasing this counter here for consistency between the two, but realistically if someone has a completely bad hostname it's probably not fixing itself.

spoonincode · 2024-09-26T02:44:30Z

plugins/net_plugin/net_plugin.cpp

+                  fc_wlog( logger, "Unable to resolve ${host}:${port} ${error}",
+                           ("host", host)("port", port)( "error", err.message() ) );
+                  c->set_state(connection::connection_state::closed);
+                  ++(c->consecutive_immediate_connection_close);


It seems like now (on 51fd8fa) after failing to resolve a host def_max_consecutive_immediate_connection_close times it just never tries it again. But if the resolve completes and simply unable to connect to the remote host, it will try retry indefinitely. I can't tell if this discrepancy is intentional or not. It makes me wonder if we should not be increasing this counter here for consistency between the two, but realistically if someone has a completely bad hostname it's probably not fixing itself.

spoonincode · 2024-09-26T02:45:15Z

plugins/net_plugin/net_plugin.cpp

+               if( !err ) {
+                  c->connect( results );
+               } else {
+                  fc_wlog( logger, "Unable to resolve ${host}:${port} ${error}",


fwiw this is a warning log, where failing to connect is an info log. Not sure if you want to make consistent or not

If the async_connect fails it calls c->close(false) which calls _close() which increments consecutive_immediate_connection_close.

I think a warn is appropriate if unable to resolve.

If the async_connect fails it calls c->close(false) which calls _close() which increments consecutive_immediate_connection_close.

There is something different between the two though. For example when I run nodeos with --p2p-peer-address fdsfdsfds.invalid:4444 --p2p-peer-address fddsfdsfdsfds.invalid:4444 --p2p-peer-address 127.0.0.1:2121 --p2p-peer-address www.google.com:4444 you'll see how the latter 2 will try to reconnect forever where the first 2 won't.

There is also some log oddness at shutdown,

info 2024-09-26T02:45:43.993 nodeos net_plugin.cpp:4578 close_all ] close all 4 connections info 2024-09-26T02:45:43.993 net-1 net_plugin.cpp:1460 _close ] ["127.0.0.1:2121" - 1 <unknown>:<unknown>] closing info 2024-09-26T02:45:43.993 net-2 net_plugin.cpp:1460 _close ] ["fddsfdsfdsfds.invalid:4444" - 2 <unknown>:<unknown>] closing info 2024-09-26T02:45:43.993 nodeos net_plugin.cpp:4385 plugin_shutdown ] exit shutdown

This different in backoff behavior seems to match Leap 4.0 behavior.

Resolve address on each re-connect

574b34c

heifner requested review from greg7mdp and linh2931 September 25, 2024 21:51

heifner added the OCI Work exclusive to OCI team label Sep 25, 2024

heifner added this to the Leap v5.0.3 milestone Sep 25, 2024

greg7mdp approved these changes Sep 25, 2024

View reviewed changes

plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved

plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved

plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved

Use lock_guard with scope instead of unique_lock

b8e8d47

linh2931 approved these changes Sep 25, 2024

View reviewed changes

spoonincode reviewed Sep 25, 2024

View reviewed changes

Maintain the reuse of connection object on reconnect

51fd8fa

spoonincode approved these changes Sep 26, 2024

View reviewed changes

heifner merged commit 2afa2a4 into release/5.0 Sep 26, 2024
29 checks passed

heifner deleted the resolve-on-reconnect branch September 26, 2024 17:45

This was referenced Sep 26, 2024

[5.0.3 -> 1.0.2] P2P: Resolve on reconnect AntelopeIO/spring#825

Merged

[1.0.2 -> main] P2P: Resolve on reconnect AntelopeIO/spring#826

Merged

Revert "[5.0.3] P2P: Resolve on reconnect" #2409

Merged

[5.0.3] P2P: Resolve on reconnect #2410

Closed

ericpassmore added the bug Something isn't working label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5.0.3] P2P: Resolve on reconnect #2408

[5.0.3] P2P: Resolve on reconnect #2408

heifner commented Sep 25, 2024

linh2931 left a comment

spoonincode Sep 25, 2024

heifner Sep 25, 2024

heifner Sep 25, 2024

spoonincode Sep 26, 2024

spoonincode Sep 26, 2024

spoonincode Sep 26, 2024

heifner Sep 26, 2024

heifner Sep 26, 2024

spoonincode Sep 26, 2024

heifner Sep 26, 2024

[5.0.3] P2P: Resolve on reconnect #2408

[5.0.3] P2P: Resolve on reconnect #2408

Conversation

heifner commented Sep 25, 2024

linh2931 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment