-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store: Add flag ignore-deletion-marks-errors to be able to ignore errors while retrieving deletion marks #7013
base: main
Are you sure you want to change the base?
Conversation
…ors while retrieving deletion marks Signed-off-by: Petter Solberg <[email protected]>
Signed-off-by: Petter Solberg <[email protected]>
Signed-off-by: Petter Solberg <[email protected]>
Signed-off-by: Petter Solberg <[email protected]>
Signed-off-by: Petter Solberg <[email protected]>
Signed-off-by: Petter Solberg <[email protected]>
This doesn't sound right. IO timeout shouldn't crash store |
I Agree that IO timeout should not crash thanos store. For me it seems that the error-handling does not handle timeout properly. Here is a complete log from thanos-store, that is currently crashlooping. We are running two replicas and both are crashlooping running v0.32.5. And the workaround is to delete the whole chunk.
|
Store: Add flag ignore-deletion-marks-errors to be able to ignore errors while retrieving deletion marks.
Our S3 implementation (Netapp) have intermittent faults that creates time-outs when querying some non-existent objects.
The IgnoreDeletionMarkFilter queries all metrics blocks for the file deletion-mark.json and when store receives an timeout or other error, it crashes. This flag ignores all fetching errors, and makes store not crash.
Fixes errors like this:
{"caller":"grpc.go:164","component":"store","err":"bucket store initial sync: sync block: filter metas: filter blocks marked for deletion: get file: 01HA07EAKT1YPCMYC6SDHS58S0/deletion-mark.json: Get \"https:<S3-URL>/thanos-metrics/01HA07EAKT1YPCMYC6SDHS58S0/deletion-mark.json\": dial tcp <IP-address>:443: i/o timeout","level":"info","msg":"internal server is shutdown gracefully","service":"gRPC/server","ts":"2023-10-31T07:06:03.987065433Z"}
Changes
Verification