You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not completely sure what causes this yet. I changed two things recently: I switched the machine vcontrold runs on from Devuan to Alpine, which is built upon musl and not glibc. However, I'm pretty sure this does not cause the issue, because:
The other thing I changed was: I noticed the vcontrold log to grow big, as it ran for years and I never cared about the log file. So I set up a logrotate script. I thought it was possibly a good idea to restart the daemon via a postrotate script.
What happened was: As of today, for the second time now, vcontrold crashed. I run it since 2015 and I never-ever experienced a single crash. First time, I thought this could be due to Alpine/musl, restarted the daemon and moved on. But today, I has a closer look. The crash happened at the very time the logrotate script ran, after the script would stop the daemon. The log contained "Received SIGTERM" as the last entry. Then, the init system (OpenRC) would not restart the daemon anymore due to it being crashed.
And the time this happened could be the very point in time where my other automated scripts polled the daemon for data. Which possibly works in most of cases, because it's not the very second the request is processed, but those two times it crashed, it may have been really simultaneously.
I thus concluded that maybe, the daemon crashes if some request is processed, and whilst this happens, the daemon receives a signal. Could this be the case? I have write access here and back then, I revised the build system – but I'm not a C pro and I'm not really into such deep stuff like daemon programming. So I fear I can't say much about the code … It would be nice if someone with more insight of this could have a look at the sources and check if we possibly have a bug here.
Apart from that, I now try to simply copytruncate the log and not restart the daemon via logrotate to see if this works – after all, there seems to be no real reason to restart it in the first place. But If we actually have a problem with concurrency, I think we should fix it. And I know that things that happen at the same time through different means can cause a lot of trouble ;-)
Thanks a lot!
Cheers, Tobias
The text was updated successfully, but these errors were encountered:
So apparently, it got a connection, then SIGTERM (by the logrotate script), and then, it logged another SIGTERM with already another PID, which should be the actual crash. All in the very same second.
Hi all,
I'm not completely sure what causes this yet. I changed two things recently: I switched the machine vcontrold runs on from Devuan to Alpine, which is built upon musl and not glibc. However, I'm pretty sure this does not cause the issue, because:
The other thing I changed was: I noticed the vcontrold log to grow big, as it ran for years and I never cared about the log file. So I set up a logrotate script. I thought it was possibly a good idea to restart the daemon via a postrotate script.
What happened was: As of today, for the second time now, vcontrold crashed. I run it since 2015 and I never-ever experienced a single crash. First time, I thought this could be due to Alpine/musl, restarted the daemon and moved on. But today, I has a closer look. The crash happened at the very time the logrotate script ran, after the script would stop the daemon. The log contained "Received SIGTERM" as the last entry. Then, the init system (OpenRC) would not restart the daemon anymore due to it being crashed.
And the time this happened could be the very point in time where my other automated scripts polled the daemon for data. Which possibly works in most of cases, because it's not the very second the request is processed, but those two times it crashed, it may have been really simultaneously.
I thus concluded that maybe, the daemon crashes if some request is processed, and whilst this happens, the daemon receives a signal. Could this be the case? I have write access here and back then, I revised the build system – but I'm not a C pro and I'm not really into such deep stuff like daemon programming. So I fear I can't say much about the code … It would be nice if someone with more insight of this could have a look at the sources and check if we possibly have a bug here.
Apart from that, I now try to simply copytruncate the log and not restart the daemon via logrotate to see if this works – after all, there seems to be no real reason to restart it in the first place. But If we actually have a problem with concurrency, I think we should fix it. And I know that things that happen at the same time through different means can cause a lot of trouble ;-)
Thanks a lot!
Cheers, Tobias
The text was updated successfully, but these errors were encountered: