-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown cleanup #98
Shutdown cleanup #98
Conversation
Hi @bjornaxis - the fix is much appreciated! I hope to have time to review soon. I will have to refresh my memory on code I haven't looked at in a long time. Meanwhile, would you mind opening an issue on the crashes and provide any info you may have that would help me reproduce it? If you have any stack traces or other clues that would also be great to have. Note that CI being stuck is probably due to the ubuntu-18.04 runner not being available anymore on github (will fix). |
Hi,
I have created issue #99 and tried to explain what is happening on a
higher level (and how to increase chances of provoking the error).
The errors I see is mostly asserts in liblsd/hash.c or normal segfaults.
Since this is a file-server and I access a lot of data to provoke it to
crash I am not sure if I can share any crash dumps since I do not know
what else could be in there (sorry).
/BA
|
Once refcount reaches zero the conn data could be deallocated and then the refcond is not valid anymore. So always signal on this under the lock to avoid this race.
Do the cleanup of the connection thread in a callback function so we can cancel the thread using pthread_cancel().
The connection thread is detached so we cannot join it to wait for its compleation on shutdown. Use the conncount in server to keep track of when all connection threads have actually finished. To keep this safe we must wait to update this value until the thread has done all of its cleanup.
The conection threads are using the resources that are deallocated in the main diod thread. If they are not shut down cleanly before this deallocation we could get craches at shutdown due to this.
893789f
to
2b03061
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hi,
We have had problems with diod crashing during shutdown and found out that there are races between thread shutdown and data deallocations.
I have made some patches here that I think solves it (at least we cannot trigger the crashes anymore).