-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drbd with rdma gets stuck when shutting down a resource #58
Comments
edit just up and down is enough to get stuck. so this is a problem with rdma in general as is it seems |
tested the other releases, this bug exists since the first release 9.2.0 and was never fixed also not in 9.2.4 |
I test wise decremented the counter before the event drbd/drbd/drbd_transport_rdma.c Line 572 in 460cfc1
This fixes this issue, but wasn't able yet to debug why this decrement is never called. Indeed it really never does so, we had a server sitting for a week in that state. |
maybe @rck can help out from here? |
when a connection is made it will again get stuck in the same position, with higher cm_counts. When a connection is cut unexpectedly (we simulated a crash), we end up with the same situation. Those things are not being freed anymore, it is just stuck. |
further debugging shows, there are deeper problems with the rdma driver. The whole system will eventually lock up after some while blocking any further file reads. |
update:
drbd/drbd/drbd_transport_rdma.c
Line 572 in 460cfc1
this is the line it gets stuck.
drbd with rdma gets stuck when disconnecting a resource in sync
here are the logs we could retrieve:
drbd version: 9.2.2
to reproduce
setup rdma synced disk. setup first node, setup second node. connect them, disconnect them, try to shut any of the two down. They will be stuck forever and only a hard reboot will release this.
The text was updated successfully, but these errors were encountered: