Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection is getting dropped with transport error #428

Open
cdecker opened this issue May 8, 2024 · 0 comments · May be fixed by #548
Open

Connection is getting dropped with transport error #428

cdecker opened this issue May 8, 2024 · 0 comments · May be fixed by #548

Comments

@cdecker
Copy link
Collaborator

cdecker commented May 8, 2024

This is a tracking issue for the transport error issue we are seeing since we activated the GL-LB for client -> node connections.

The root cause is a short loadbalancer keepalive timeout of approximately 30 seconds, in combination with calls that take longer than 30 seconds, such as some pay calls. The loadbalancer drops the connection after the connection has been idle, i.e., not transferring any data. Usually the client connection is configured to send keepalive messages (PING) when the connection is idle, while a call is pending (also referred to by the generic term stream, as everything in grpc is a stream). The configuration of the connection is here:

let chan = tonic::transport::Endpoint::from_shared(node_uri.to_string())?
.tls_config(tls.inner)?
.tcp_keepalive(Some(crate::TCP_KEEPALIVE))
.http2_keep_alive_interval(crate::TCP_KEEPALIVE)
.keep_alive_timeout(crate::TCP_KEEPALIVE_TIMEOUT)
.keep_alive_while_idle(true)
.connect_lazy();

And it configures keepalives to be very aggressive, and also enabled if there is no call pending. However, running with RUST_LOG=trace we can see that no pings are effectively being sent over the wire. This is strange as the signer is configured the same way, but that one is sending pings (it also has an active stream to stream_hsm_requests so that might be a difference).

The documentation on the semantics of keepalives is here: https://grpc.io/docs/guides/keepalive/

This was referenced May 8, 2024
@cdecker cdecker linked a pull request Nov 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant