Skip to content

Commit

Permalink
Fix race between timeout gate timing out and delegated gate returning…
Browse files Browse the repository at this point in the history
… successfully (#8149)

* Fix race between timeout gate timing out and delegated gate returning successfully.

* Update changelog entry.
  • Loading branch information
charleskorn authored May 16, 2024
1 parent 307567a commit 9e2f4b7
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 2 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
* [ENHANCEMENT] Rules: Add metric `cortex_prometheus_rule_group_last_restore_duration_seconds` which measures how long it takes to restore rule groups using the `ALERTS_FOR_STATE` series #7974
* [ENHANCEMENT] OTLP: Improve remote write format translation performance by using label set hashes for metric identifiers instead of string based ones. #8012
* [ENHANCEMENT] Querying: Remove OpEmptyMatch from regex concatenations. #8012
* [ENHANCEMENT] Store-gateway: add `-blocks-storage.bucket-store.max-concurrent-queue-timeout`. When set, queries at the store-gateway's query gate will not wait longer than that to execute. If a query reaches the wait timeout, then the querier will retry the blocks on a different store-gateway. If all store-gateways are unavailable, then the query will fail with `err-mimir-store-consistency-check-failed`. #7777
* [ENHANCEMENT] Store-gateway: add `-blocks-storage.bucket-store.max-concurrent-queue-timeout`. When set, queries at the store-gateway's query gate will not wait longer than that to execute. If a query reaches the wait timeout, then the querier will retry the blocks on a different store-gateway. If all store-gateways are unavailable, then the query will fail with `err-mimir-store-consistency-check-failed`. #7777 #8149
* [ENHANCEMENT] Ingester: Optimize querying with regexp matchers. #8106
* [ENHANCEMENT] Distributor: Introduce `-distributor.max-request-pool-buffer-size` to allow configuring the maximum size of the request pool buffers. #8082
* [ENHANCEMENT] Ingester: active series are now updated along with owned series. They decrease when series change ownership between ingesters. This helps provide a more accurate total of active series when ingesters are added. This is only enabled when `-ingester.track-ingester-owned-series` or `-ingester.use-ingester-owned-series-for-limits` are enabled. #8084
Expand Down
2 changes: 1 addition & 1 deletion pkg/storegateway/bucket_stores.go
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ func (t timeoutGate) Start(ctx context.Context) error {
defer cancel()

err := t.delegate.Start(ctx)
if errors.Is(context.Cause(ctx), errGateTimeout) {
if err != nil && errors.Is(context.Cause(ctx), errGateTimeout) {
_ = spanlogger.FromContext(ctx, log.NewNopLogger()).Error(err)
err = errGateTimeout
}
Expand Down
21 changes: 21 additions & 0 deletions pkg/storegateway/bucket_stores_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1042,3 +1042,24 @@ func must[T any](v T, err error) T {
}
return v
}

func TestTimeoutGate_CancellationRace(t *testing.T) {
gate := timeoutGate{
delegate: alwaysSuccessfulAfterDelayGate{time.Second},
timeout: time.Nanosecond,
}

err := gate.Start(context.Background())
require.NoError(t, err, "must not return failure if delegated gate returns success even after timeout expires")
}

type alwaysSuccessfulAfterDelayGate struct {
delay time.Duration
}

func (a alwaysSuccessfulAfterDelayGate) Start(_ context.Context) error {
<-time.After(a.delay)
return nil
}

func (a alwaysSuccessfulAfterDelayGate) Done() {}

0 comments on commit 9e2f4b7

Please sign in to comment.