heartbeat is sent even after io loop panicked #401

slavik-pastushenko · 2024-04-18T12:32:06Z

Hello!
I am using tokio and trying to implement the reconnection flow.
I found out that even if your app panicked, the heartbeat is still sent.

2024-04-18T12:17:15.514560Z DEBUG send heartbeat
2024-04-18T12:17:19.316301Z DEBUG received heartbeat from server
2024-04-18T12:17:22.219640Z ERROR Socket was readable but we read 0. This usually means that the connection is half closed this mark it as broken
2024-04-18T12:17:22.219744Z ERROR error doing IO error=IOError(Kind(ConnectionAborted))
2024-04-18T12:17:22.219765Z ERROR Connection error error=IO error: connection aborted
thread 'lapin-io-loop' panicked at /xxx:144:42:
IO error: connection aborted
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
...

after it panicked, you can see that there is no response but it still sends it.

2024-04-18T12:17:25.517414Z DEBUG send heartbeat
2024-04-18T12:17:35.520383Z DEBUG send heartbeat
2024-04-18T12:17:45.522635Z DEBUG send heartbeat
2024-04-18T12:17:55.525602Z DEBUG send heartbeat

The way I got it is:

channel.on_error(|error| panic!("{}", error));
connection.on_error(|error| panic!("{}", error));

The text was updated successfully, but these errors were encountered:

slavik-pastushenko · 2024-04-18T12:35:19Z

I also implemented the way @mjanssen suggested here: #389 (comment)

it works correctly when you gracefully shutdown your rmq cluster.

if you force shutdown a rmq cluster, either:

it says reconnected, no errors, but I cannot see my connection/exchange/queue in the rabbitmq management console.
does not fetch an error and continues to send a heartbeat (without any response from the server). After 10-12 minutes it throws an error below, reconnects and works fine:

2024-04-18T09:47:17.930807Z ERROR error doing IO error=IOError(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) 2024-04-18T09:47:17.930854Z ERROR Connection error error=IO error: Connection reset by peer (os error 104)

slavik-pastushenko · 2024-04-18T12:53:13Z

any ideas, @Keruspe

slavik-pastushenko · 2024-04-19T10:03:48Z

2024-04-19T10:02:40.575129Z DEBUG send heartbeat
2024-04-19T10:02:40.575247Z TRACE send_frame channel=0
2024-04-19T10:02:40.575255Z TRACE wake channel=0
2024-04-19T10:02:40.575329Z TRACE Got event for socket event=Wake
2024-04-19T10:02:40.575393Z TRACE will write to buffer next_msg=AMQPFrame::Heartbeat
2024-04-19T10:02:40.575499Z TRACE wrote 8 bytes
2024-04-19T10:02:40.575560Z TRACE io_loop do_run done can_read=false can_write=true has_data=false status=Connected
2024-04-19T10:02:40.575571Z TRACE io_loop run
2024-04-19T10:02:40.575576Z TRACE io_loop do_run can_read=false can_write=true has_data=false

while my rmq cluster was completely off. if I restart my pod it will fetch that a connection cannot be established and will wait until my rmq cluster is back.

slavik-pastushenko · 2024-04-19T10:40:48Z

after 6-7 minutes it got an error:

2024-04-19T10:29:53.277538Z TRACE io_loop do_run can_read=false can_write=true has_data=false
2024-04-19T10:29:56.852285Z TRACE Got event for socket event=Readable
2024-04-19T10:29:56.852424Z ERROR Socket was readable but we read 0. This usually means that the connection is half closed this mark it as broken
2024-04-19T10:29:56.852431Z ERROR error doing IO error=IOError(Kind(ConnectionAborted))
2024-04-19T10:29:56.852443Z ERROR Connection error error=IO error: connection aborted
2024-04-19T10:29:56.852472Z TRACE set_error consumer_tag=my_tag
2024-04-19T10:29:56.852487Z TRACE cancel consumer_tag=my_tag

and successfully restarted.

slavik-pastushenko · 2024-04-19T14:27:33Z

After running tcpdump on the client we noticed that heartbeats get occasionally sent (to the pre-termination IP) but there is no response and the reconnection happens at completely random intervals eg. 1 service reconnects straight away and 3 others wait for up to 25 minutes.
Can we somehow validate a heartbeat response or throw an exception in the case where we don’t receive a heartbeat response from the server in let's say around after 2 or 3 heartbeats?

Keruspe · 2024-04-19T14:48:46Z

I thought it was already the case.
Will give it a proper look.

If you have a pure testing environment to which I could get tmp access it could help if the issue turns out to be... not that obvious

slavik-pastushenko · 2024-04-20T13:09:44Z

the easiest way to reproduce it locally (using rabbitmq in docker with the default configuration):

setup a connection and set the on_error handler:

let connection = Connection::connect(
                &addr,
                ConnectionProperties::default()
                    .with_executor(tokio_executor_trait::Tokio::current())
                    .with_reactor(tokio_reactor_trait::Tokio),
            )
            .await?;

connection.on_error(|error| panic!("{}", error));

create a channel and set the on_error handler:

let channel = connection.create_channel().await?;
channel.on_error(|error| panic!("{}", error));

stop the rabbitmq container
receive an error:

2024-04-20T13:05:15.522848Z ERROR error doing IO error=IOError(Kind(ConnectionAborted))
2024-04-20T13:05:15.522870Z ERROR Connection error error=IO error: connection aborted
thread 'lapin-io-loop' panicked at /my-path.../x.rs:
IO error: connection aborted
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: core::panicking::panic_display

see we still send a heartbeat without a response:

2024-04-20T13:05:16.763136Z DEBUG send heartbeat
2024-04-20T13:05:16.763422Z TRACE send_frame channel=0
2024-04-20T13:05:16.763460Z TRACE wake channel=0
2024-04-20T13:05:26.766356Z DEBUG send heartbeat
2024-04-20T13:05:26.766625Z TRACE send_frame channel=0
2024-04-20T13:05:26.766658Z TRACE wake channel=0

2024-04-20T13:05:36.768294Z DEBUG send heartbeat
2024-04-20T13:05:36.768708Z TRACE send_frame channel=0
2024-04-20T13:05:36.768760Z TRACE wake channel=0
2024-04-20T13:05:46.770914Z DEBUG send heartbeat
2024-04-20T13:05:46.771068Z TRACE send_frame channel=0
2024-04-20T13:05:46.771093Z TRACE wake channel=0
2024-04-20T13:05:56.774126Z DEBUG send heartbeat
2024-04-20T13:05:56.774387Z TRACE send_frame channel=0
2024-04-20T13:05:56.774492Z TRACE wake channel=0

Fixes #401 #403

Keruspe · 2024-04-22T20:46:02Z

Can you give #404 a try?

Fixes #401 #403

slavik-pastushenko · 2024-04-23T09:45:41Z

#403 solved the issue.

slavik-pastushenko mentioned this issue Apr 22, 2024

fix(heartbeat): stop heartbeat if connection is not active #403

Merged

Keruspe mentioned this issue Apr 22, 2024

heartbeat: monitor incoming heartbeats #404

Closed

Keruspe added a commit that referenced this issue Apr 22, 2024

heartbeat: monitor incoming heartbeats

399c85f

Fixes #401 #403

Keruspe added a commit that referenced this issue Apr 22, 2024

heartbeat: monitor incoming heartbeats

33ace98

Fixes #401 #403

Keruspe closed this as completed in #403 Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heartbeat is sent even after io loop panicked #401

heartbeat is sent even after io loop panicked #401

slavik-pastushenko commented Apr 18, 2024 •

edited

Loading

slavik-pastushenko commented Apr 18, 2024 •

edited

Loading

slavik-pastushenko commented Apr 18, 2024

slavik-pastushenko commented Apr 19, 2024

slavik-pastushenko commented Apr 19, 2024

slavik-pastushenko commented Apr 19, 2024

Keruspe commented Apr 19, 2024

slavik-pastushenko commented Apr 20, 2024 •

edited

Loading

Keruspe commented Apr 22, 2024

slavik-pastushenko commented Apr 23, 2024

heartbeat is sent even after io loop panicked #401

heartbeat is sent even after io loop panicked #401

Comments

slavik-pastushenko commented Apr 18, 2024 • edited Loading

slavik-pastushenko commented Apr 18, 2024 • edited Loading

slavik-pastushenko commented Apr 18, 2024

slavik-pastushenko commented Apr 19, 2024

slavik-pastushenko commented Apr 19, 2024

slavik-pastushenko commented Apr 19, 2024

Keruspe commented Apr 19, 2024

slavik-pastushenko commented Apr 20, 2024 • edited Loading

Keruspe commented Apr 22, 2024

slavik-pastushenko commented Apr 23, 2024

slavik-pastushenko commented Apr 18, 2024 •

edited

Loading

slavik-pastushenko commented Apr 18, 2024 •

edited

Loading

slavik-pastushenko commented Apr 20, 2024 •

edited

Loading