Skip to content

Commit

Permalink
store-gateway: Add timeout for index-header loading gate (#8138)
Browse files Browse the repository at this point in the history
* store-gateway: Add timeout for index-header loading gate

This change introduces an optional timeout on the gate used to limit the
concurrency of index-header loads. This helps in cases where a store-gateway
may have to load a large index-header before it can serve a query. This
prevents and unbounded number of requests from blocking while index-headers
are loaded.

Fixes #8137

Signed-off-by: Nick Pillitteri <[email protected]>

* Note experimental flag and add comment

Signed-off-by: Nick Pillitteri <[email protected]>

---------

Signed-off-by: Nick Pillitteri <[email protected]>
  • Loading branch information
56quarters authored May 16, 2024
1 parent aea91a4 commit a6fa9b0
Show file tree
Hide file tree
Showing 7 changed files with 30 additions and 3 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
* [ENHANCEMENT] OTLP: Improve remote write format translation performance by using label set hashes for metric identifiers instead of string based ones. #8012
* [ENHANCEMENT] Querying: Remove OpEmptyMatch from regex concatenations. #8012
* [ENHANCEMENT] Store-gateway: add `-blocks-storage.bucket-store.max-concurrent-queue-timeout`. When set, queries at the store-gateway's query gate will not wait longer than that to execute. If a query reaches the wait timeout, then the querier will retry the blocks on a different store-gateway. If all store-gateways are unavailable, then the query will fail with `err-mimir-store-consistency-check-failed`. #7777 #8149
* [ENHANCEMENT] Store-gateway: add `-blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout`. When set, loads of index-headers at the store-gateway's index-header lazy load gate will not wait longer than that to execute. If a load reaches the wait timeout, then the querier will retry the blocks on a different store-gateway. If all store-gateways are unavailable, then the query will fail with `err-mimir-store-consistency-check-failed`. #8138
* [ENHANCEMENT] Ingester: Optimize querying with regexp matchers. #8106
* [ENHANCEMENT] Distributor: Introduce `-distributor.max-request-pool-buffer-size` to allow configuring the maximum size of the request pool buffers. #8082
* [ENHANCEMENT] Ingester: active series are now updated along with owned series. They decrease when series change ownership between ingesters. This helps provide a more accurate total of active series when ingesters are added. This is only enabled when `-ingester.track-ingester-owned-series` or `-ingester.use-ingester-owned-series-for-limits` are enabled. #8084
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -8518,6 +8518,17 @@
"fieldType": "int",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "lazy_loading_concurrency_queue_timeout",
"required": false,
"desc": "Timeout for the queue of index header loads. If the queue is full and the timeout is reached, the load will return an error. 0 means no timeout and the load will wait indefinitely.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "verify_on_load",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,8 @@ Usage of ./cmd/mimir/mimir:
[experimental] If enabled, store-gateway will periodically persist block IDs of lazy loaded index-headers and load them eagerly during startup. Ignored if index-header lazy loading is disabled. (default true)
-blocks-storage.bucket-store.index-header.lazy-loading-concurrency int
Maximum number of concurrent index header loads across all tenants. If set to 0, concurrency is unlimited. (default 4)
-blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout duration
[experimental] Timeout for the queue of index header loads. If the queue is full and the timeout is reached, the load will return an error. 0 means no timeout and the load will wait indefinitely.
-blocks-storage.bucket-store.index-header.lazy-loading-enabled
If enabled, store-gateway will lazy load an index-header only once required by a query. (default true)
-blocks-storage.bucket-store.index-header.lazy-loading-idle-timeout duration
Expand Down
1 change: 1 addition & 0 deletions docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ The following features are currently experimental:
- Use of Redis cache backend (`-blocks-storage.bucket-store.chunks-cache.backend=redis`, `-blocks-storage.bucket-store.index-cache.backend=redis`, `-blocks-storage.bucket-store.metadata-cache.backend=redis`)
- `-blocks-storage.bucket-store.series-selection-strategy`
- Eagerly loading some blocks on startup even when lazy loading is enabled `-blocks-storage.bucket-store.index-header.eager-loading-startup-enabled`
- Set a timeout for index-header lazy loading (`-blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout`)
- Read-write deployment mode
- API endpoints:
- `/api/v1/user_limits`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3759,6 +3759,12 @@ bucket_store:
# CLI flag: -blocks-storage.bucket-store.index-header.lazy-loading-concurrency
[lazy_loading_concurrency: <int> | default = 4]
# (experimental) Timeout for the queue of index header loads. If the queue
# is full and the timeout is reached, the load will return an error. 0 means
# no timeout and the load will wait indefinitely.
# CLI flag: -blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout
[lazy_loading_concurrency_queue_timeout: <duration> | default = 0s]
# (advanced) If true, verify the checksum of index headers upon loading them
# (either on startup or lazily when lazy loading is enabled). Setting to
# true helps detect disk corruption at the cost of slowing down index header
Expand Down
8 changes: 6 additions & 2 deletions pkg/storegateway/bucket_stores.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,9 @@ func NewBucketStores(cfg tsdb.BlocksStorageConfig, shardingStrategy ShardingStra
lazyLoadingGate := gate.NewNoop()
lazyLoadingMax := cfg.BucketStore.IndexHeader.LazyLoadingConcurrency
if lazyLoadingMax != 0 {
blockingGate := gate.NewBlocking(cfg.BucketStore.IndexHeader.LazyLoadingConcurrency)
lazyLoadingGate = gate.NewInstrumented(lazyLoadingGateReg, cfg.BucketStore.IndexHeader.LazyLoadingConcurrency, blockingGate)
lazyLoadingGate = gate.NewBlocking(cfg.BucketStore.IndexHeader.LazyLoadingConcurrency)
lazyLoadingGate = gate.NewInstrumented(lazyLoadingGateReg, cfg.BucketStore.IndexHeader.LazyLoadingConcurrency, lazyLoadingGate)
lazyLoadingGate = timeoutGate{delegate: lazyLoadingGate, timeout: cfg.BucketStore.IndexHeader.LazyLoadingConcurrencyQueueTimeout}
}

u := &BucketStores{
Expand Down Expand Up @@ -445,6 +446,9 @@ func (t timeoutGate) Start(ctx context.Context) error {
defer cancel()

err := t.delegate.Start(ctx)
// Note that we only return an error for a timeout when the delegate has also returned an
// error. This ensures that when we get a slot in the delegate, our caller will call Done()
// and release the slot.
if err != nil && errors.Is(context.Cause(ctx), errGateTimeout) {
_ = spanlogger.FromContext(ctx, log.NewNopLogger()).Error(err)
err = errGateTimeout
Expand Down
4 changes: 3 additions & 1 deletion pkg/storegateway/indexheader/header.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ type Config struct {
LazyLoadingIdleTimeout time.Duration `yaml:"lazy_loading_idle_timeout" category:"advanced"`

// Maximum index-headers loaded into store-gateway concurrently
LazyLoadingConcurrency int `yaml:"lazy_loading_concurrency" category:"advanced"`
LazyLoadingConcurrency int `yaml:"lazy_loading_concurrency" category:"advanced"`
LazyLoadingConcurrencyQueueTimeout time.Duration `yaml:"lazy_loading_concurrency_queue_timeout" category:"experimental"`

VerifyOnLoad bool `yaml:"verify_on_load" category:"advanced"`
}
Expand All @@ -80,6 +81,7 @@ func (cfg *Config) RegisterFlagsWithPrefix(f *flag.FlagSet, prefix string) {
f.BoolVar(&cfg.LazyLoadingEnabled, prefix+"lazy-loading-enabled", DefaultIndexHeaderLazyLoadingEnabled, "If enabled, store-gateway will lazy load an index-header only once required by a query.")
f.DurationVar(&cfg.LazyLoadingIdleTimeout, prefix+"lazy-loading-idle-timeout", DefaultIndexHeaderLazyLoadingIdleTimeout, "If index-header lazy loading is enabled and this setting is > 0, the store-gateway will offload unused index-headers after 'idle timeout' inactivity.")
f.IntVar(&cfg.LazyLoadingConcurrency, prefix+"lazy-loading-concurrency", 4, "Maximum number of concurrent index header loads across all tenants. If set to 0, concurrency is unlimited.")
f.DurationVar(&cfg.LazyLoadingConcurrencyQueueTimeout, prefix+"lazy-loading-concurrency-queue-timeout", 0, "Timeout for the queue of index header loads. If the queue is full and the timeout is reached, the load will return an error. 0 means no timeout and the load will wait indefinitely.")
f.BoolVar(&cfg.EagerLoadingStartupEnabled, prefix+"eager-loading-startup-enabled", true, "If enabled, store-gateway will periodically persist block IDs of lazy loaded index-headers and load them eagerly during startup. Ignored if index-header lazy loading is disabled.")
f.BoolVar(&cfg.VerifyOnLoad, prefix+"verify-on-load", false, "If true, verify the checksum of index headers upon loading them (either on startup or lazily when lazy loading is enabled). Setting to true helps detect disk corruption at the cost of slowing down index header loading.")
}
Expand Down

0 comments on commit a6fa9b0

Please sign in to comment.