From b54c0947ab9372ac47638280758a740c3aa2845f Mon Sep 17 00:00:00 2001 From: Bryan Boreham Date: Wed, 25 Sep 2024 15:25:02 +0100 Subject: [PATCH] Runbook: clarify MimirIngesterReachingSeriesLimit errors and retries --- docs/sources/mimir/manage/mimir-runbooks/_index.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/sources/mimir/manage/mimir-runbooks/_index.md b/docs/sources/mimir/manage/mimir-runbooks/_index.md index 1b0ea26423f..6757555f9af 100644 --- a/docs/sources/mimir/manage/mimir-runbooks/_index.md +++ b/docs/sources/mimir/manage/mimir-runbooks/_index.md @@ -41,7 +41,15 @@ If nothing obvious from the above, check for increased load: ### MimirIngesterReachingSeriesLimit -This alert fires when the `max_series` per ingester instance limit is enabled and the actual number of in-memory series in an ingester is reaching the limit. Once the limit is reached, writes to the ingester will fail (5xx) for new series, while appending samples to existing ones will continue to succeed. +This alert fires when the `max_series` per ingester instance limit is enabled and the actual number of in-memory series in an ingester is reaching the limit. +The threshold is set at 80%, to give some chance to react before the limit is reached. +Once the limit is reached, writes to the ingester will fail for new series. Appending samples to existing ones will continue to succeed. + +Note that the error responses sent back to the sender are classed as "server error" (5xx), which should result in a retry by the sender. +While this situation continues, these retries will stall the flow of data, and newer data will queue up on the sender. +If the condition is cleared in a short time, service can be restored with no data loss. + +This is different to what happens when the `max_global_series_per_user` is exceeded, which is considered a "client error" (4xx) where excess data is discarded. In case of **emergency**: