Monitoring of prometheus-adapter metrics #636

matthewjstanford · 2024-01-18T16:09:56Z

What happened?:

I've got a set of custom metrics defined in prometheus-adapter. I was refactoring the source metrics in Prometheus (modifying labels) and inadvertently broke one of the custom metric in prometheus-adapter.

This specific custom metric was used by an HPA, along with CPU & Memory. When the custom metric stopped responding (returning a 404) the HPA went into the weeds and scaled the deployment way up. I believe this is mostly a bug in how the HPA handles missing metrics, but this really begs the question, how can I monitor the health of custom metrics provided by prometheus-adapter?

What did you expect to happen?:

I expected the prometheus-adapter to emit prometheus metrics itself. Something along these lines:

example metrics

# TYPE prometheus_adapter_custom_request_status_total gauge
prometheus_adapter_custom_request_status_total{metric="my_custom_metric", status="200"} 1
prometheus_adapter_custom_request_status_total{metric="my_inalid_custom_metric", status="404"} 2

# TYPE prometheus_adapter_external_request_status_total gauge
prometheus_adapter_external_request_status_total{metric="my_external_metric", status="200"} 5
prometheus_adapter_external_request_status_total{metric="my_invalid_external_metric", status="404"} 6

But I don't believe prometheus-adapter emits any metrics (hopefully I'm wrong!).

Having info like this would enable the ability to actively monitor the availability of critical custom metrics, such as the ones discussed above.

The text was updated successfully, but these errors were encountered:

matthewjstanford · 2024-01-23T15:25:32Z

It looks like I can monitor the availability of the prometheus-adapter metrics via a Horizontal Pod Autoscaler metric, kube_horizontalpodautoscaler_status_target_metric.

This is a bit backwards, IMO, but it at least provides a mechanism to monitor the metrics.

dgrisonnet · 2024-01-25T17:55:35Z

/triage accepted
/assign

pznamensky · 2024-06-11T13:51:47Z

Same for us - we've broken an external metric and got to know about it after several days. It would be great to somehow monitor prometheus-adapter itself.

matthewjstanford added the kind/bug Categorizes issue or PR as related to a bug. label Jan 18, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 18, 2024

k8s-ci-robot assigned dgrisonnet Jan 25, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring of prometheus-adapter metrics #636

Monitoring of prometheus-adapter metrics #636

matthewjstanford commented Jan 18, 2024

matthewjstanford commented Jan 23, 2024

dgrisonnet commented Jan 25, 2024

pznamensky commented Jun 11, 2024

Monitoring of prometheus-adapter metrics #636

Monitoring of prometheus-adapter metrics #636

Comments

matthewjstanford commented Jan 18, 2024

matthewjstanford commented Jan 23, 2024

dgrisonnet commented Jan 25, 2024

pznamensky commented Jun 11, 2024