Monitoring of prometheus-adapter metrics #636
Labels
kind/bug
Categorizes issue or PR as related to a bug.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
What happened?:
I've got a set of custom metrics defined in prometheus-adapter. I was refactoring the source metrics in Prometheus (modifying labels) and inadvertently broke one of the custom metric in prometheus-adapter.
This specific custom metric was used by an HPA, along with CPU & Memory. When the custom metric stopped responding (returning a 404) the HPA went into the weeds and scaled the deployment way up. I believe this is mostly a bug in how the HPA handles missing metrics, but this really begs the question, how can I monitor the health of custom metrics provided by prometheus-adapter?
What did you expect to happen?:
I expected the prometheus-adapter to emit prometheus metrics itself. Something along these lines:
example metrics
But I don't believe prometheus-adapter emits any metrics (hopefully I'm wrong!).
Having info like this would enable the ability to actively monitor the availability of critical custom metrics, such as the ones discussed above.
The text was updated successfully, but these errors were encountered: