Elasticsearch

The Challenge

Consider as an example: attempting to run Elasticsearch on Kubernetes. When running Elasticsearch, a given shard will be replicated a certain number of times between Elasticsearch nodes, to ensure that the shard will continue to be available, even if a specific Elasticsearch node is unavailable. In Kubernetes, the concept of an "Elasticsearch node" maps to a Pod. As such, we are faced with a dilemma as to when it is safe to terminate a Pod. On one hand, we need to ensure that each Elasticsearch shard always belongs to a Pod that is ready; if a shard is replicated twice, but both Pods to which the shard belongs are unavailable, then the shard itself will be unavailable, and the availability of the Elasticsearch cluster will have been disrupted by the Kubernetes cluster operation. This is unacceptable in production clusters.

Proposed solutions that attempt to solve this problem with a PodDisruptionBudget (like Elastic's own cloud-on-k8s project) are naive and insufficient. Elastic's official approach is to not report a given Elasticsearch pod as ready until the entire cluster is green. However, if the cluster is momentarily yellow, then this results in the entire cluster becoming unavailable, with cascading failures in dependent services which are still functional, despite the cluster being in a yellow (i.e. under-replicated, but not unavailable) state. The more mature approach is to only check the health of the local Pod, i.e. to run GET /_cluster/health?local=true as a readiness check, but this no longer couples Kubernetes's understanding of readiness to Elasticsearch's notion of shard availability. Therefore, the fact that specific Pods in Kubernetes are available or unavailable, and that specific PodDisruptionBudgets are satisfied or unsatisfied, is no longer in and of itself sufficient to safely signal to cluster tooling whether it is safe to terminate the underlying Kubernetes nodes.

The Solution

The Prometheus exporter for Elasticsearch exposes cluster health information in a Prometheus metric called elasticsearch_cluster_health_status. Writing a Prometheus alert to notify when a cluster is unhealthy is then as simple as writing the following alert:

alert: ElasticsearchClusterUnhealthy
expr: elasticsearch_cluster_health_status{color!="green"} != 0
labels:
  severity: warning
annotations:
  summary: ES cluster {{$labels.cluster}} is not healthy
  description: The ES cluster {{$labels.cluster}} is currently responding with color {{$labels.color}}.

When this alert is firing, the prometheus-alert-readiness pod will respond as NotReady and prevent the cluster tooling from draining any Elasticsearch nodes, and therefore preventing the cluster tooling from evicting any additional Elasticsearch pods. When the cluster's health returns to green, then the prometheus-alert-readiness pod will respond as Ready and allow the cluster tooling to proceed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elasticsearch.md

elasticsearch.md

Elasticsearch

The Challenge

The Solution

Files

elasticsearch.md

Latest commit

History

elasticsearch.md

File metadata and controls

Elasticsearch

The Challenge

The Solution