Skip to content

Commit

Permalink
Merge branch 'grafana:main' into feat/helm-support-dedicated-ruler-re…
Browse files Browse the repository at this point in the history
…ad-path
  • Loading branch information
alex5517 authored May 27, 2024
2 parents 4e8cdc8 + cc7c348 commit 2efb611
Show file tree
Hide file tree
Showing 34 changed files with 3,276 additions and 2,299 deletions.
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
* [ENHANCEMENT] Store-gateway: add `-blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout`. When set, loads of index-headers at the store-gateway's index-header lazy load gate will not wait longer than that to execute. If a load reaches the wait timeout, then the querier will retry the blocks on a different store-gateway. If all store-gateways are unavailable, then the query will fail with `err-mimir-store-consistency-check-failed`. #8138
* [ENHANCEMENT] Ingester: Optimize querying with regexp matchers. #8106
* [ENHANCEMENT] Distributor: Introduce `-distributor.max-request-pool-buffer-size` to allow configuring the maximum size of the request pool buffers. #8082
* [ENHANCEMENT] Store-gateway: improve performance when streaming chunks to queriers is enabled (`-querier.prefer-streaming-chunks-from-store-gateways=true`) and the query selects fewer than `-blocks-storage.bucket-store.batch-series-size` series (defaults to 5000 series). #8039
* [ENHANCEMENT] Ingester: active series are now updated along with owned series. They decrease when series change ownership between ingesters. This helps provide a more accurate total of active series when ingesters are added. This is only enabled when `-ingester.track-ingester-owned-series` or `-ingester.use-ingester-owned-series-for-limits` are enabled. #8084
* [BUGFIX] Rules: improve error handling when querier is local to the ruler. #7567
* [BUGFIX] Querier, store-gateway: Protect against panics raised during snappy encoding. #7520
Expand Down Expand Up @@ -69,6 +70,7 @@
* [BUGFIX] Querying: matrix results returned from instant queries were not sorted by series. #8113
* [BUGFIX] Query scheduler: Fix a crash in result marshaling. #8140
* [BUGFIX] Store-gateway: Allow long-running index scans to be interrupted. #8154
* [BUGFIX] Query-frontend: fix splitting of queries using `@ start()` and `@end()` modifiers on a subquery. Previously the `start()` and `end()` would be evaluated using the start end end of the split query instead of the original query. #8162

### Mixin

Expand All @@ -92,8 +94,9 @@
* Overview dashboard, Status panel, `cortex_request_duration_seconds` metric.
* [ENHANCEMENT] Alerts: exclude `529` and `598` status codes from failure codes in `MimirRequestsError`. #7889
* [ENHANCEMENT] Dashboards: renamed "TCP Connections" panel to "Ingress TCP Connections" in the networking dashboards. #8092
* [BUGFIX] Dashboards: Fix regular expression for matching read-path gRPC ingester methods to include querying of exemplars, label-related queries, or active series queries. #7676
* [BUGFIX] Dashboards: Fix user id abbreviations and column heads for Top Tenants dashboard. #7724
* [ENHANCEMENT] Dashboards: update the use of deprecated "table (old)" panels to "table". #8181
* [BUGFIX] Dashboards: fix regular expression for matching read-path gRPC ingester methods to include querying of exemplars, label-related queries, or active series queries. #7676
* [BUGFIX] Dashboards: fix user id abbreviations and column heads for Top Tenants dashboard. #7724
* [BUGFIX] Dashboards: fix incorrect query used for "queue length" panel on "Ruler" dashboard. #8006

### Jsonnet
Expand All @@ -115,6 +118,7 @@

* [CHANGE] Deprecated `--rule-files` flag in favor of CLI arguments. #7756
* [BUGFIX] Fix panic in `loadgen` subcommand. #7629
* [ENHANCEMENT] Add `mimir-http-prefix` configuration to set the Mimir URL prefix when using legacy routes. #8069
* [ENHANCEMENT] `mimirtool promql format`: Format PromQL query with Prometheus' string or pretty-print formatter. #7742
* [BUGFIX] `mimirtool rules prepare`: do not add aggregation label to `on()` clause if already present in `group_left()` or `group_right()`. #7839
* [BUGFIX] Analyze Grafana: fix parsing queries with variables. #8062
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
description:
Learn how to configure Helm installed Grafana Mimir's cluster label to prevent the Mimir components to join
different Memberlist cluster.
menuTitle: "Configure a unique Memberlist cluster label"
title: "Configure a unique Grafana Mimir's Memberlist cluster label in the mimir-distributed Helm chart installation"
weight: 110
---

# Configure a unique Grafana Mimir's Memberlist cluster label in the mimir-distributed Helm chart installation

This document shows the steps to configure cluster label verification in a Grafana Mimir installed by Helm.
Multiple [Memberlist](https://grafana.com/docs/mimir/<MIMIR_VERSION>/references/architecture/memberlist-and-the-gossip-protocol/) [gossip ring](https://grafana.com/docs/mimir/<MIMIR_VERSION>/references/architecture/hash-ring/) clusters are at risk of merging into one without enabling cluster label verification.
For example, if a Mimir, Tempo or Loki are running in the same Kubernetes cluster, they might communicate with each other without this configuration update.
Once cluster label verification is enabled, before Mimir components communicate with other components, they will verify whether the other components have the same cluster label.
The process to update the configuration will take three rollouts of the whole cluster.

## Before you begin

- You have a Grafana Mimir installed by mimir-distributed helm chart with its Memberlist cluster label still set to default value.
- You have `kubectl` and `helm` command line configured to connect to the Kubernetes cluster where your Grafana Mimir is running.

## Configuration update steps

There are three steps of the configuration update:

1. Disable Memberlist cluster label verification
1. Set cluster label on all Mimir components
1. Enable Memberlist cluster label verification again

### 1. Disable Memberlist cluster label verification

Cluster label verification flag is enabled by default with cluster label set to an empty string.
Using the default value of cluster label can make different systems that use Memberlist communicate with each other if they also have not updated the default cluster label.
Setting a new cluster label directly to a non-empty string value without first disabling cluster label verification will cause Memberlist to form partition in the Grafana Mimir cluster.
The partition makes some Mimir components have different cluster label values which can prevent the component from communicating.
To disable cluster label verification flag, set the following structured config in mimir-distributed values.yaml configuration.

```yaml
mimir:
structuredConfig:
memberlist:
cluster_label_verification_disabled: true
```
Rollout the installation to apply the configuration changes by running `helm upgrade <my-mimir-release> mimir-distributed -f values.yaml`.
Replace `<my-mimir-release>` with the actual Mimir release name. Wait until all Pods are ready before going to the next step.

### 2. Set cluster label on all Mimir components

Set cluster label on all Mimir components by setting the following configuration.
The configuration will set `cluster_label` to the Helm release name and the namespace where the helm release is installed.
Updating a new cluster label after disabling cluster label verification will prevent Memberlist from forming a partition.

```yaml
mimir:
structuredConfig:
memberlist:
cluster_label_verification_disabled: true
cluster_label: "{{.Release.Name}}-{{.Release.Namespace}}"
```

Apply the configuration changes again by running `helm upgrade <my-mimir-release> mimir-distributed -f values.yaml`.
Replace `<my-mimir-release>` with the actual Mimir release name. Wait until all Pods are ready before going to the next step.

### 3. Enable Memberlist cluster label verification

Remove `mimir.structuredConfig.memberlist.cluster_label_verification_disabled` from the values.yaml file to re-enable Memberlist cluster label verification.

```yaml
mimir:
structuredConfig:
memberlist:
cluster_label: "{{.Release.Name}}-{{.Release.Namespace}}"
```

Apply the configuration changes by running `helm upgrade <my-mimir-release> mimir-distributed -f values.yaml`.
Replace `<my-mimir-release>` with the actual Mimir release name. Wait until all Pods are ready before verifying that the configuration is applied correctly.

## Verifying the configuration changes

Once the rollout is completed, verify the change by looking at the `/memberlist` endpoint in some of Grafana Mimir pods.
Run the following port-forward command on several different Grafana Mimir components.

```bash
kubectl port-forward pod/<mimir-pod-1> --kube-context=<my-k8s-context> --namespace=<my-mimir-namespace> 8080:8080
kubectl port-forward pod/<mimir-pod-2> --kube-context=<my-k8s-context> --namespace=<my-mimir-namespace> 8081:8080
```

Replace `<mimir-pod-1>` and `<mimir-pod-2>` with several actual pods from different Mimir components.
Ensure the host port 8080 and 8081 are available, otherwise use different available ports.

Open the port-forwarded URL in browser to see the Memberlist status http://localhost:8080/memberlist, http://localhost:8081/memberlist and also
few others Grafana Mimir components. The Memberlist page from different pods must show same view of all of their members.
Loading

0 comments on commit 2efb611

Please sign in to comment.