Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown cleanup #98

Merged
merged 4 commits into from
Dec 29, 2024
Merged

Shutdown cleanup #98

merged 4 commits into from
Dec 29, 2024

Conversation

bjornaxis
Copy link
Contributor

Hi,

We have had problems with diod crashing during shutdown and found out that there are races between thread shutdown and data deallocations.

I have made some patches here that I think solves it (at least we cannot trigger the crashes anymore).

@garlick
Copy link
Member

garlick commented Mar 21, 2024

Hi @bjornaxis - the fix is much appreciated! I hope to have time to review soon. I will have to refresh my memory on code I haven't looked at in a long time. Meanwhile, would you mind opening an issue on the crashes and provide any info you may have that would help me reproduce it? If you have any stack traces or other clues that would also be great to have.

Note that CI being stuck is probably due to the ubuntu-18.04 runner not being available anymore on github (will fix).

@bjornaxis
Copy link
Contributor Author

bjornaxis commented Mar 21, 2024 via email

Björn Ardö added 4 commits December 29, 2024 07:45
Once refcount reaches zero the conn data could be deallocated
and then the refcond is not valid anymore. So always signal on
this under the lock to avoid this race.
Do the cleanup of the connection thread in a callback function
so we can cancel the thread using pthread_cancel().
The connection thread is detached so we cannot join it to wait
for its compleation on shutdown. Use the conncount in server to
keep track of when all connection threads have actually finished.
To keep this safe we must wait to update this value until the
thread has done all of its cleanup.
The conection threads are using the resources that are deallocated
in the main diod thread. If they are not shut down cleanly before
this deallocation we could get craches at shutdown due to this.
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit ab8aab0 into chaos:master Dec 29, 2024
5 of 7 checks passed
@garlick garlick mentioned this pull request Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants