From 2f179f60562a379402d8c1a4fa4da9e0cc2c542e Mon Sep 17 00:00:00 2001 From: Vladimir Varankin Date: Wed, 25 Sep 2024 13:39:14 +0200 Subject: [PATCH 1/5] draft 2.14 release notes Signed-off-by: Vladimir Varankin --- docs/sources/mimir/release-notes/v2.14.md | 139 ++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 docs/sources/mimir/release-notes/v2.14.md diff --git a/docs/sources/mimir/release-notes/v2.14.md b/docs/sources/mimir/release-notes/v2.14.md new file mode 100644 index 00000000000..a42b403c7ed --- /dev/null +++ b/docs/sources/mimir/release-notes/v2.14.md @@ -0,0 +1,139 @@ +--- +title: "Grafana Mimir version 2.14 release notes" +menuTitle: "V2.14 release notes" +description: "Release notes for Grafana Mimir version 2.14" +weight: 1100 +--- + +# Grafana Mimir version 2.14 release notes + + + + + +Grafana Labs is excited to announce version 2.14 of Grafana Mimir. + +The highlights that follow include the top features, enhancements, and bug fixes in this release. +For the complete list of changes, refer to the [CHANGELOG](https://github.com/grafana/mimir/blob/main/CHANGELOG.md). + +## Features and enhancements + +The minimal supported version of Go is updated to 1.22. + +The streaming of the chunks from store-gateways to queriers is enabled by default. + +Alertmanager adds the `-alertmanager.log-parsing-label-matchers` configuration option to control logging when parsing label matchers. +This flag is intended to be used with `-alertmanager.utf8-strict-mode-enabled` to validate UTF-8 strict mode is working as intended. +The default value for the configuration option is `false`. + +Alertmanager adds the `-alertmanager.utf8-migration-logging-enabled` configuration option, that enables logging of tenant configurations, +incompatible with UTF-8 strict mode. The default value for the configuration option is `false`. + +Compactor adds a new `cortex_compactor_disk_out_of_space_errors_total` counter metric, that tracks how many times a compaction failed +due to the compactor being out of disk. A new related alert `MimirCompactorHasRunOutOfDiskSpace` and it's runbook are added. + +Incoming OTLP requests were previously size-limited with the distributor's `-distributor.max-recv-msg-size` configuration. +Distributor has a new `-distributor.max-otlp-request-size` configuration for limiting OTLP requests. The default value is set to limit request to 100 MiB. + +Distributor now replies with `Retry-After` header on retryable errors. The feature is controlled by the `-distributor.retry-after-header.enabled` configuration, +that is enabled by default. + +## Important changes + +In Grafana Mimir 2.14 the following behavior has changed: + +When executing a remote read request, the querier now honors the start/end time range, specified in the read hints. + +To prevent queue starvation with querier-worker queue prioritization algorithm, the querier now sets the minimum `-querier.max-concurrent` to four. +Values below the allowed minimum are ignored. + +Distributor configuration's `-distributor.retry-after-header.max-backoff-exponent` and `-distributor.retry-after-header.base-seconds` are replaced +with `-distributor.retry-after-header.min-backoff` and `-distributor.retry-after-header.max-backoff` for easier configuration. + +The circuit breakers in ingester are improved to not open in case of per-instance limit errors. Since now on the circuit breakers +trigger the opening only if push and pull requests exceed the configured duration. + +The default inactivity timeout of active series in ingesters, controlled by the `-ingester.active-series-metrics-idle-timeout` configuration +is increased from `10m` to `20m`. + +Query-frontend now returns the "413 Request Entity Too Large" error when a response shard from the `/active_series` request exceeds the limit. + +Following featues of store-gateway are changed: `-blocks-storage.bucket-store.max-concurrent-queue-timeout` is set to 5 seconds; +`-blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout` is set to 5 seconds; +`-blocks-storage.bucket-store.max-concurrent` is set to 200; + +Number of previously deprecated configuration options were removed in the release: + +- the `-ingester.return-only-grpc-errors` option in ingester +- the `-ingester.client.circuit-breaker.*` options in ingester +- the `-ingester.limit-inflight-requests-using-grpc-method-limiter` option in ingester +- the `-ingester.client.report-grpc-codes-in-instrumentation-label-enabled` option in distributor and ruler +- the `-distributor.limit-inflight-requests-using-grpc-method-limiter` option in distributor +- the `-distributor.enable-otlp-metadata-storage` option in distributor +- the `-ruler.drain-notification-queue-on-shutdown` option in ruler +- the `-querier.max-query-into-future` option in querier +- the `-querier.prefer-streaming-chunks-from-store-gateways` option in querier and store-gateway +- the `-query-scheduler.use-multi-algorithm-query-queue` option in querier-scheduler +- the YAML configuration `frontend.align_queries_with_step` in query-frontend + +## Experimental features + +Grafana Mimir 2.14 includes new features that are considered experimental and disabled by default. +Use them with caution and report any issues you encounter: + +A new experimental Kafka-based ingest storage architecture is added, that decouples the write and read path through a Kafka-compatible backend. +When enabled, distributors write incoming write requests to the Kafka-compatible backend, and the ingesters asynchronously replay ingested data from Kafka. + +New related configuration options that enable and configure the new architecture in Mimir are: `-ingest-storage.enabled`, `-ingest-storage.kafka.*`, +`-ingest-storage.ingestion-partition-tenant-shard-size`, `-ingest-storage.read-consistency`, `-ingest-storage.migration.distributor-send-to-ingesters-enabled`, +`-ingester.partition-ring.*`. + +Querier has an experimental streaming PromQL engine. The configuration option `-querier.query-engine=mimir` enables the engine. + +Ingester added experimental `-ingester.ignore-ooo-exemplars` configuration. When set, the out of order exemplars are no longer reported +to the remote write client. + +Querier now supports the experimental `limitk()` and `limit_ratio()` PromQL functions. The feature is disabled by default, +but can be enabled with the `-querier.promql-experimental-functions-enabled=true` setting in the query-frontend and querier. + +Querier supports for the experimental `X-Mimir-Chunk-Info-Logger` header, that triggers logging information about TSDB chunks loaded from ingesters and store-gateways in the querier. +The header should contain the comma separated list of labels, for which their value will be included in the logs. + +Ruler includes the experimental `-ruler.rule-evaluation-write-enabled` configuration, that disables writing the result of rule evaluation to ingesters. +This feature is meant to be used for testing. + +## Bug fixes + +- Ruler: add support for draining any outstanding alert notifications before shutting down. This can be enabled with the `-ruler.drain-notification-queue-on-shutdown=true` CLI flag. +- Query-frontend: fix `-querier.max-query-lookback` enforcement when `-compactor.blocks-retention-period` is not set, and viceversa. +- Ingester: fix sporadic `not found` error causing an internal server error if label names are queried with matchers during head compaction. +- Ingester, store-gateway: fix case insensitive regular expressions not matching correctly some Unicode characters. +- Query-frontend: "query stats" log now includes the actual `status_code` when the request fails due to an error occurring in the query-frontend itself. +- Store-gateway: fixed a case where, on a quick subsequent restart, the previous lazy-loaded index header snapshot was overwritten by a partially loaded one. +- Ingester: fixed timestamp reported in the "the sample has been rejected because its timestamp is too old" error when the write request contains only histograms. +- Store-gateway: store sparse index headers atomically to disk. +- Query scheduler: fix a panic in request queueing. +- Querier: fix issue where "context canceled" is logged for trace spans for requests to store-gateways that return no series when chunks streaming is enabled. +- Alertmanager: Fix per-tenant silence limits not reloaded during runtime. +- Alertmanager: Fixes a number of bugs in silences which could cause an existing silence to be deleted/expired when updating the silence failed. This could happen when the replacing silence was invalid or exceeded limits. +- Alertmanager: Fix help message for utf-8-strict-mode. +- Query-frontend: Ensure that internal errors result in an HTTP 500 response code instead of 422. +- Configuration: Multi line envs variables are flatten during injection to be compatible with YAML syntax. +- Querier: fix issue where queries can return incorrect results if a single store-gateway returns overlapping chunks for a series. +- HA Tracker: store correct timestamp for last received request from elected replica. +- Querier: do not return `grpc: the client connection is closing` errors as HTTP `499`. +- Compactor: fix a race condition between different compactor replicas that may cause a deleted block to be still referenced as non-deleted in the bucket index. +- Querier: fix issue where some native histogram-related warnings were not emitted when `rate()` was used over native histograms. +- Ruler: map invalid org-id errors to 400 status code. +- Querier: Fix invalid query results when multiple chunks are being merged. +- Query-frontend: return annotations generated during evaluation of sharded queries. +- Querier: Support optional start and end times on `/prometheus/api/v1/labels`, `/prometheus/api/v1/label/