Skip to content

Commit

Permalink
docs: Add best practices for metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
mrueg committed Oct 17, 2024
1 parent 75fba81 commit af6c15a
Showing 1 changed file with 71 additions and 0 deletions.
71 changes: 71 additions & 0 deletions docs/design/metrics-best-practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Kube-State-Metrics - Timeseries best practices

---

Author: Manuel Rüger (<[email protected]>)

Date: October 17th 2024

---

# Introduction

Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics.
This document provides guidelines with the goal to create a good user experience when using these metrics.

Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices.
Feel encouraged to report these metrics and provide a pull request to improve them.

# General best practices

We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeloing.

# Best practices for kube-state-metrics

## Avoid pre-computation

kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects.
By exposing raw metrics instead of counters, kube-state-metrics allows the user to have full control on how they want to use the metrics.

## Static object properties

An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes.
This includes properties like name, namespace, uid etc.
It is a good practice to group those together into an `_info` metric

## Dynamic object properties

An object can also have a dynamic set of properties, which are usually part of the status field.
These change during the lifecycle of the object.
For example a pod can be in different states like "Pending", "Running" etc.
These should be part of a new metric that includes labels that identify the object as well as the dynamic property.

## Linked properties

If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric.

## Optional properties

Some Kubernetes objects have optional fields. In case there is an optional value, it is better to not expose the label at all instead of exposing a "nil" value or an empty string.

## Timestamps

Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`.

## Cardinality

Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot.
In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others.
If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error.
If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided.

# Stability

We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics.
Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API.
They can change anytime and should be used with caution.
They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them.

Stable metrics are considered frozen with the exception of new labels being added.
A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2.

0 comments on commit af6c15a

Please sign in to comment.