-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes on DNS resolvers receiving thousands of requests #26
Comments
I believe this is happening due to this section Lines 376 to 402 in a513e78
If it helps, in our case, our nproc is 8. |
interesting, if you can compile a custom version and test, see if changing the channel size at Line 379 in a513e78
|
looking at this a little more in depth, I think this is a regression I introduced when I refactored handlePacket to take a channel instead of an individual packet. We turn on ZeroCopy when we capture packets because there is a performance win, however, that means that libpcap is using a circular buffer internally to hold the packets for processing. If the packet capture rate causes the packet being processed to be overwritten before we finish processing it, we can get crazy-looking errors like from your stacktrace because the underlying data changes. An easy way to test this is to comment out https://github.com/Phillipmartin/gopassivedns/blob/master/main.go#L394, re-compile and run the binary on one of these heavy-load systems. If it does not show the same failure, we can look at turning off ZeroCopy or copying the packet data (depending on the perf impact of each strategy) |
Increasing the channel buffer as previously mentioned Line 379 in a513e78
has dramatically improved the stability of gopassivedns on our resolvers doing ~5k qps. However as mentioned above it is not a bullet proof fix. I will investigate further the recommended NoCopy change to see if that has any direct improvement in uptime vs cpu usage penalties. |
@jaredledvina a pull request has recently been merged, in our use case we have seen no crash with significant DNS query load. I would be very interested to know if you have time to see if you see improvements in stability. |
Hey @jimmystewpot , Oh nice, it's been a while since I've worked with our install of |
@jaredledvina did you end up seeing any improvements? |
We're currently looking at reworking how we've built our |
Stack Trace:
We have a few DNS resolvers that handle between 2800-3800 requests per minute (according to the logs gopassivedns generates). It's much more frequent on resolves seeing >7500 requests per minute. It seems that after a while and not consistently the process will die and dump the above stack trace. This is not an issue at all on resolves seeing between 200-700 requests per minute. Happy to provide any other information that might be useful!
The text was updated successfully, but these errors were encountered: