From af6c15a408c46d46f8b49a80801904663fae3e40 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Manuel=20R=C3=BCger?= Date: Thu, 17 Oct 2024 23:40:53 +0200 Subject: [PATCH] docs: Add best practices for metrics --- docs/design/metrics-best-practices.md | 71 +++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 docs/design/metrics-best-practices.md diff --git a/docs/design/metrics-best-practices.md b/docs/design/metrics-best-practices.md new file mode 100644 index 0000000000..9fd60a9cff --- /dev/null +++ b/docs/design/metrics-best-practices.md @@ -0,0 +1,71 @@ +# Kube-State-Metrics - Timeseries best practices + +--- + +Author: Manuel RĂ¼ger () + +Date: October 17th 2024 + +--- + +# Introduction + +Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics. +This document provides guidelines with the goal to create a good user experience when using these metrics. + +Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices. +Feel encouraged to report these metrics and provide a pull request to improve them. + +# General best practices + +We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeloing. + +# Best practices for kube-state-metrics + +## Avoid pre-computation + +kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects. +By exposing raw metrics instead of counters, kube-state-metrics allows the user to have full control on how they want to use the metrics. + +## Static object properties + +An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes. +This includes properties like name, namespace, uid etc. +It is a good practice to group those together into an `_info` metric + +## Dynamic object properties + +An object can also have a dynamic set of properties, which are usually part of the status field. +These change during the lifecycle of the object. +For example a pod can be in different states like "Pending", "Running" etc. +These should be part of a new metric that includes labels that identify the object as well as the dynamic property. + +## Linked properties + +If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric. + +## Optional properties + +Some Kubernetes objects have optional fields. In case there is an optional value, it is better to not expose the label at all instead of exposing a "nil" value or an empty string. + +## Timestamps + +Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`. + +## Cardinality + +Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot. +In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others. +If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error. +If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided. + +# Stability + +We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics. +Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API. +They can change anytime and should be used with caution. +They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them. + +Stable metrics are considered frozen with the exception of new labels being added. +A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2. +