-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add back the helm suspended metric #1111
Conversation
At some point we had this and then we lost it. Discovered after we started suspending a bunch of things but could not get this metric to appear, meaning we are currently in a quasi-state of releases suspended across all our clusters that we don't know about. Signed-off-by: James M Leddy <[email protected]>
The suspend label has been included in the |
This is extra toil for us to get a back metric that we were alerting on and have completely lost. We have no idea how many helm releases are paused not applying resource request updates or whatever. And it's inconsistently applied. Why do our kustomizations still report when they are stalled but our helm releases don't? I realize that there are different maintainers that have different opinions about what metrics should be exposed, but to the end user this all just looks like "flux", since all the controllers come with flux. For anyone that might find this PR and wonder what the kube-state-metrics config is, seems to be here |
The core maintainers make the decisions for the common behaviour of all Flux controllers and the metrics fall into this category. We made the decision to drop the resource specific metrics from the controllers exporters and rely on kube-prometheus-stack. The deprecation notice can be found here: https://fluxcd.io/flux/monitoring/metrics/#warning-deprecated-resource-metrics This controller was last promoted to GA, so we removed the deprecated metrics from it, but we should've done that in all controllers. We'll make sure the old metrics are removed in the next release across all Flux components. |
Thank you, though I would prefer you add back this metric everywhere to avoid requiring everyone to add 275 lines of yaml to their kube-prometheus-stack helm chart, the inconsistency is even worse than the first decision as it feels uneven. And probably also led us to slower detection of the issue as it was still finding suspends "sometimes". So looking forward to having a consistent view from the Flux controller maintainers here :) |
Also, is this you? fluxcd/flux2-monitoring-example#35 (comment) |
@jmleddy you can read our motivation in this issue: fluxcd/flux2#4128 If you don't like the kube-state-metrics approach feel free to use the Flux Operator, the tradeoff is that you can't customise those metrics in any way. |
Okay I thought the controller was running as part of the operator I must not have my kube config right. I'll run it as part of the ksm thanks! |
At some point we had this and then we lost it. Discovered after we started suspending a bunch of things but could not get this metric to appear, meaning we are currently in a quasi-state of releases suspended across all our clusters that we don't know about.