Skip to content

Commit

Permalink
Adding circuit breakers on ingester server side for write path
Browse files Browse the repository at this point in the history
Signed-off-by: Yuri Nikolic <[email protected]>
  • Loading branch information
duricanikolic committed May 25, 2024
1 parent 5f01872 commit 26fd19e
Show file tree
Hide file tree
Showing 12 changed files with 653 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
* [FEATURE] Server: added experimental [PROXY protocol support](https://www.haproxy.org/download/2.3/doc/proxy-protocol.txt). The PROXY protocol support can be enabled via `-server.proxy-protocol-enabled=true`. When enabled, the support is added both to HTTP and gRPC listening ports. #7698
* [FEATURE] mimirtool: Add `runtime-config verify` sub-command, for verifying Mimir runtime config files. #8123
* [FEATURE] Query-frontend, querier: new experimental `/cardinality/active_native_histogram_metrics` API to get active native histogram metric names with statistics about active native histogram buckets. #7982 #7986 #8008
* [FEATURE] Ingester: add experimental support for the server-side circuit breakers when writing to ingesters via `-ingester.circuit-breaker.enabled`, `-ingester.circuit-breaker.failure-threshold`, or `-ingester.circuit-breaker.cooldown-period` or their corresponding YAML. Added metrics `cortex_ingester_circuit_breaker_results_total` and `cortex_ingester_circuit_breaker_transitions_total`. #8180
* [ENHANCEMENT] Reduced memory allocations in functions used to propagate contextual information between gRPC calls. #7529
* [ENHANCEMENT] Distributor: add experimental limit for exemplars per series per request, enabled with `-distributor.max-exemplars-per-series-per-request`, the number of discarded exemplars are tracked with `cortex_discarded_exemplars_total{reason="too_many_exemplars_per_series_per_request"}` #7989 #8010
* [ENHANCEMENT] Store-gateway: merge series from different blocks concurrently. #7456
Expand Down
87 changes: 87 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -3129,6 +3129,93 @@
"fieldFlag": "ingester.owned-series-update-interval",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "block",
"name": "circuit_breaker",
"required": false,
"desc": "",
"blockEntries": [
{
"kind": "field",
"name": "enabled",
"required": false,
"desc": "Enable circuit breaking when making requests to ingesters",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "ingester.circuit-breaker.enabled",
"fieldType": "boolean",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "failure_threshold",
"required": false,
"desc": "Max percentage of requests that can fail over period before the circuit breaker opens",
"fieldValue": null,
"fieldDefaultValue": 10,
"fieldFlag": "ingester.circuit-breaker.failure-threshold",
"fieldType": "int",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "failure_execution_threshold",
"required": false,
"desc": "How many requests must have been executed in period for the circuit breaker to be eligible to open for the rate of failures",
"fieldValue": null,
"fieldDefaultValue": 100,
"fieldFlag": "ingester.circuit-breaker.failure-execution-threshold",
"fieldType": "int",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "thresholding_period",
"required": false,
"desc": "Moving window of time that the percentage of failed requests is computed over",
"fieldValue": null,
"fieldDefaultValue": 60000000000,
"fieldFlag": "ingester.circuit-breaker.thresholding-period",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "cooldown_period",
"required": false,
"desc": "How long the circuit breaker will stay in the open state before allowing some requests",
"fieldValue": null,
"fieldDefaultValue": 10000000000,
"fieldFlag": "ingester.circuit-breaker.cooldown-period",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "initial_delay",
"required": false,
"desc": "How long the circuit breaker should wait between creation and starting up. During that time both failures and successes will not be counted.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "ingester.circuit-breaker.initial-delay",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "push_timeout",
"required": false,
"desc": "How long is execution of ingester's Push supposed to last before it is reported as timeout in a circuit breaker. This configuration is used for circuit breakers only, and timeout expirations are not reported as errors",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "ingester.circuit-breaker.push-timeout",
"fieldType": "duration",
"fieldCategory": "experiment"
}
],
"fieldValue": null,
"fieldDefaultValue": null
}
],
"fieldValue": null,
Expand Down
14 changes: 14 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -1301,6 +1301,20 @@ Usage of ./cmd/mimir/mimir:
After what time a series is considered to be inactive. (default 10m0s)
-ingester.active-series-metrics-update-period duration
How often to update active series metrics. (default 1m0s)
-ingester.circuit-breaker.cooldown-period duration
[experimental] How long the circuit breaker will stay in the open state before allowing some requests (default 10s)
-ingester.circuit-breaker.enabled
[experimental] Enable circuit breaking when making requests to ingesters
-ingester.circuit-breaker.failure-execution-threshold uint
[experimental] How many requests must have been executed in period for the circuit breaker to be eligible to open for the rate of failures (default 100)
-ingester.circuit-breaker.failure-threshold uint
[experimental] Max percentage of requests that can fail over period before the circuit breaker opens (default 10)
-ingester.circuit-breaker.initial-delay duration
[experimental] How long the circuit breaker should wait between creation and starting up. During that time both failures and successes will not be counted.
-ingester.circuit-breaker.push-timeout duration
How long is execution of ingester's Push supposed to last before it is reported as timeout in a circuit breaker. This configuration is used for circuit breakers only, and timeout expirations are not reported as errors
-ingester.circuit-breaker.thresholding-period duration
[experimental] Moving window of time that the percentage of failed requests is computed over (default 1m0s)
-ingester.client.backoff-max-period duration
Maximum delay when backing off. (default 10s)
-ingester.client.backoff-min-period duration
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,8 @@ Usage of ./cmd/mimir/mimir:
Print basic help.
-help-all
Print help, also including advanced and experimental parameters.
-ingester.circuit-breaker.push-timeout duration
How long is execution of ingester's Push supposed to last before it is reported as timeout in a circuit breaker. This configuration is used for circuit breakers only, and timeout expirations are not reported as errors
-ingester.max-global-metadata-per-metric int
The maximum number of metadata per metric, across the cluster. 0 to disable.
-ingester.max-global-metadata-per-user int
Expand Down
10 changes: 9 additions & 1 deletion docs/sources/mimir/configure/about-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,12 +115,20 @@ The following features are currently experimental:
- `-ingester.track-ingester-owned-series`
- `-ingester.use-ingester-owned-series-for-limits`
- `-ingester.owned-series-update-interval`
- Per-ingester circuit breaking based on requests timing out or hitting per-instance limits
- `-ingester.circuit-breaker.enabled`
- `-ingester.circuit-breaker.failure-threshold`
- `-ingester.circuit-breaker.failure-execution-threshold`
- `-ingester.circuit-breaker.thresholding-period`
- `-ingester.circuit-breaker.cooldown-period`
- `-ingester.circuit-breaker.initial-delay`
- `-ingester.circuit-breaker.push-timeout`
- Ingester client
- Per-ingester circuit breaking based on requests timing out or hitting per-instance limits
- `-ingester.client.circuit-breaker.enabled`
- `-ingester.client.circuit-breaker.failure-threshold`
- `-ingester.client.circuit-breaker.failure-execution-threshold`
- `-ingester.client.circuit-breaker.period`
- `-ingester.client.circuit-breaker.thresholding-period`
- `-ingester.client.circuit-breaker.cooldown-period`
- Querier
- Use of Redis cache backend (`-blocks-storage.bucket-store.metadata-cache.backend=redis`)
Expand Down
38 changes: 38 additions & 0 deletions docs/sources/mimir/configure/configuration-parameters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1212,6 +1212,44 @@ instance_limits:
# owned series as a result of detected change.
# CLI flag: -ingester.owned-series-update-interval
[owned_series_update_interval: <duration> | default = 15s]
circuit_breaker:
# (experimental) Enable circuit breaking when making requests to ingesters
# CLI flag: -ingester.circuit-breaker.enabled
[enabled: <boolean> | default = false]
# (experimental) Max percentage of requests that can fail over period before
# the circuit breaker opens
# CLI flag: -ingester.circuit-breaker.failure-threshold
[failure_threshold: <int> | default = 10]
# (experimental) How many requests must have been executed in period for the
# circuit breaker to be eligible to open for the rate of failures
# CLI flag: -ingester.circuit-breaker.failure-execution-threshold
[failure_execution_threshold: <int> | default = 100]
# (experimental) Moving window of time that the percentage of failed requests
# is computed over
# CLI flag: -ingester.circuit-breaker.thresholding-period
[thresholding_period: <duration> | default = 1m]
# (experimental) How long the circuit breaker will stay in the open state
# before allowing some requests
# CLI flag: -ingester.circuit-breaker.cooldown-period
[cooldown_period: <duration> | default = 10s]
# (experimental) How long the circuit breaker should wait between creation and
# starting up. During that time both failures and successes will not be
# counted.
# CLI flag: -ingester.circuit-breaker.initial-delay
[initial_delay: <duration> | default = 0s]
# (experiment) How long is execution of ingester's Push supposed to last
# before it is reported as timeout in a circuit breaker. This configuration is
# used for circuit breakers only, and timeout expirations are not reported as
# errors
# CLI flag: -ingester.circuit-breaker.push-timeout
[push_timeout: <duration> | default = 0s]
```

### querier
Expand Down
Loading

0 comments on commit 26fd19e

Please sign in to comment.