Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix crash after receiving a KickedOff message #1521

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dgelessus
Copy link
Contributor

The client reliably crashes after the server sends a Auth2Cli_KickedOff message. I assume this went unnoticed because DIRTSAND never sends that message and even on other servers it doesn't occur in normal gameplay.

I spent a while staring at this in the debugger. Apparently the underlying issue is that the auth server socket is closed by its own notify callback (by the NetCliAuthDisconnect call in RecvMsg<Auth2Cli_KickedOff>). The async socket code doesn't notice the disconnect in this case, so the disconnect notify callback is never called and the auth connection isn't shut down properly. This causes the client to hang on exit, until it tries to send something to the auth server (usually a ping), which then crashes from trying to write to a closed socket.

This PR works around the issue by closing the auth connection asynchronously outside the notify callback, and tightening the locking in the async socket code so nothing else can manipulate a socket while its callbacks are running. This seems to fix the hang/crash, but I'm not confident that this is a complete or good solution, so I've made this a draft for now.

I feel like the right solution would be to fire the close notify callback some other way, without relying on the read operation to detect that the socket was closed, but I have no idea how...

Otherwise the operations fail with an invalid handle error, which is a
bit less obvious when debugging.
Every AsyncSocket is guarded by its own fCritsect. All ConnectOperations
are guarded by the shared s_connectCrit.
Apparently the NetCli infrastructure and/or our asio-based sockets
really don't like it when a socket is closed inside its own notify read
callback. So instead, let ReportNetError disconnect the auth connection
via the error callback, which runs asynchronously on the main thread.
@dgelessus
Copy link
Contributor Author

Welp, I just hit a deadlock with these changes active: the main thread was sending a transaction that sends an auth message (i. e. plNglTrans s_critsect locked and waiting on the auth socket fCritsect), while one of the socket IO worker threads was dispatching an auth message whose handler sends a transaction (i. e. auth socket fCritsect locked and waiting on plNglTrans s_critsect).

So we can't just keep the socket locked during the entire notify callback. I suppose I could revert the locking changes - in my testing, 8a73bfc on its own already avoids the crash, but without the extra locking, the crash could still occur with some unlucky timing/scheduling...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant