-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query scheduler: Fix a panic in stats marshaling #8140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
How hard would it be to add a test for this case that reliably fails without the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can even add a test it would be great. But I wouldn't block on it. LGTM
I added a test that fails the race detector reliably without making the copy. (AFAICT, our tests in CI run with race detector enabled, right?)
I also rigged up a couple of hacky local tests that I ran 100 times and saw the expected panic. (Not including all of the querier/block store querier machinery, but with a goroutine that just spins and increments stats a million times.) But making those hacky tests deterministic would be... an undertaking. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the test looks good
we still track flaky tests and fix them if they become too annoying, so I think these tests are a pretty good addition |
I'll merge before the weekend (& shutdown day) |
* Querier: Copy the stats before marshaling. * Add to changelog. * Reuse Merge * Add a test for the race detector.
What this PR does
Fixes a panic in Querier services caused by request-processing goroutines that are still running for a short time after their contexts are canceled and are continuing to modify query stats while querier's scheduler attempts to marshal them.
Like so:
Which issue(s) this PR fixes or relates to
Fixes #7895
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.