Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Alternative Suspend Control #4271

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

haggishunk
Copy link

Added rfc-0006 for a proposal on an alternative suspend control for flux resources.


This feature would create an alternate path to suspending an object and would not violate the current apis.

### Resource Object Status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be reflected in the Status, as the object would not be reconciled. Instead, the object should have a IsSuspended() method which would look at both the .spec.suspend and the set annotation.


## Summary

This RFC proposes an alternative method to indicate the suspended state of suspendable resources to flux controllers through object metadata. It presents an annotation key that can support suspending indefinitely or according to a schedule. It does not address the deprecation of the the `.spec.suspend` field from the resource apis.
Copy link
Contributor

@darkowlzz darkowlzz Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to a schedule

This seems like a new thing that we didn't discuss in #4232 . Based on the the requirement of #4232, this seems out of scope. Although using annotations for suspending opens up some interesting ways to also potentially serve the needs of time based gating, based on my discussion with others about it, I don't think we anticipated to tackle that with this change. If things change and we decide to do something about it, we can consider it. But maybe for now, it'd be better to focus on the initial goal of #4232 to allow setting some reason for suspending the object that'll help the teams reason about why an object was suspended. I think it'd be good to include user stories about this use cases.

There are a few more things I would like to mention regarding the overall proposal.

  • In the summary and goals, I think it's okay to be more specific and mention the new annotation key that we discussed and seemed to have agreed upon reconcile.fluxcd.io/suspended.
  • In the goals section below, it mentions that the CLI will manipulate the .spec.suspend and suspend status in the suspend and resume subcommands. I think for setting suspend, the CLI should use only the new object annotation. We would like to gradually phase out the usage of .spec.suspend. If no reason flag/argument is provided, it can set the value of the annotation to "true". For resuming, the CLI can check in both .spec.suspend and the annotation for suspend key and remove them. And for the get subcommand, the CLI can take into consideration both the .spec.suspend and the annotation to compute the suspension state of the object. If any of the two methods enable suspend, the result should be SUSPENDED=True in the CLI table output.
  • In the proposal section below, it mentions "Register a resource object status for the suspended state". Reading some other parts of the document, I get the sense that this is referring to the .status field of the object. Is that right? We generally use the term "object status" to refer to .status. If that's what this is referring to, as hidde pointed out in another comment, we don't need to reflect the effective status/value of suspension in the object status. Just the presence of suspended annotation or .spec.suspend being true is enough to indicate that the object is suspended. Any reader of the API should look for these two fields. We can provide some helper function to make it easy to take into account both of these values and return an effective suspension value, also considering true/false, on/off, etc kind of values to make it more flexible. Also, based on some previous discussions, my understanding is that we don't want to reconcile at all when the object is marked as suspended. And this could be done using the predicates in controller-runtime, refer to https://github.com/kubernetes-sigs/controller-runtime/blob/v0.16.2/pkg/predicate/doc.go in case you're unfamiliar with it. We want to filter out object events if they are suspended and not reconcile at all. An example of such predicates in flux is https://github.com/fluxcd/source-controller/blob/v1.1.1/internal/controller/source_predicate.go. But since this will be a common predicate for all of the flux controllers, it'd be good to add it in https://github.com/fluxcd/pkg/tree/main/runtime/predicates where we already have a predicate for reconcile requested annotation, which is similar to this annotation we are about to add.
  • Regarding suspend metrics, in the last release of flux, v2.1.0, we deprecated some metrics that are specific to the object state, refer https://fluxcd.io/flux/monitoring/metrics/#-deprecated-resource-metrics. Suspend metrics is one of them. Users can now set up their own custom metrics for any fields in the API objects that are relevant for them. You can read more about it in the custom metrics docs. In the new flux metrics examples we provide, we have some default metrics labels which includes .spec.suspend value for suspended metric label, see https://github.com/fluxcd/flux2-monitoring-example/blob/21c9646949d7ec0c4352353b06771b18152f38e3/monitoring/controllers/kube-prometheus-stack/kube-state-metrics-config.yaml#L49. For annotation based suspend value, we can add another field, something like
suspendedAnnotation: [ metadata, annotations, reconcile.fluxcd.io/suspended ]

that'll extract the suspended annotation value. And then in metrics dashboard, one can make some query like

gotk_resource_info{suspendedAnnotation!=""} or gotk_resource_info{suspended="true"}

to get all the suspended objects. The query can be customized further to also take into account other values for the annotation.

Hope these are some useful information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that scheduled suspension should be out of scope here because it opens up a whole new problem space and things to consider.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for the feedback. I'm going to make some edits to focus this RFC on the shift away from .spec.suspend and supporting an optional message/reason re: #4232.

@haggishunk haggishunk force-pushed the rfc-alternative-suspend-control branch from 6fe7ca5 to 507cd78 Compare October 10, 2023 20:19
Added discussion and implementation samples for update predicate.

Altered plans for handling suspended objects only by annotation and
omitting any status updates or .spec.suspend manipulation.

Signed-off-by: Travis Mattera <[email protected]>
@haggishunk haggishunk force-pushed the rfc-alternative-suspend-control branch from e65c427 to cb4db08 Compare October 18, 2023 15:32
Added implementation details regarding controller predicates.

Rephrased and wordsmithed content.

Signed-off-by: Travis Mattera <[email protected]>
@stefanprodan stefanprodan changed the title RFC-0006: Alternative Suspend Control [RFC] Alternative Suspend Control Jan 4, 2024
@stefanprodan stefanprodan marked this pull request as draft January 4, 2024 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants