-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Alternative Suspend Control #4271
base: main
Are you sure you want to change the base?
[RFC] Alternative Suspend Control #4271
Conversation
|
||
This feature would create an alternate path to suspending an object and would not violate the current apis. | ||
|
||
### Resource Object Status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be reflected in the Status, as the object would not be reconciled. Instead, the object should have a IsSuspended()
method which would look at both the .spec.suspend
and the set annotation.
|
||
## Summary | ||
|
||
This RFC proposes an alternative method to indicate the suspended state of suspendable resources to flux controllers through object metadata. It presents an annotation key that can support suspending indefinitely or according to a schedule. It does not address the deprecation of the the `.spec.suspend` field from the resource apis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to a schedule
This seems like a new thing that we didn't discuss in #4232 . Based on the the requirement of #4232, this seems out of scope. Although using annotations for suspending opens up some interesting ways to also potentially serve the needs of time based gating, based on my discussion with others about it, I don't think we anticipated to tackle that with this change. If things change and we decide to do something about it, we can consider it. But maybe for now, it'd be better to focus on the initial goal of #4232 to allow setting some reason for suspending the object that'll help the teams reason about why an object was suspended. I think it'd be good to include user stories about this use cases.
There are a few more things I would like to mention regarding the overall proposal.
- In the summary and goals, I think it's okay to be more specific and mention the new annotation key that we discussed and seemed to have agreed upon
reconcile.fluxcd.io/suspended
. - In the goals section below, it mentions that the CLI will manipulate the
.spec.suspend
and suspend status in thesuspend
andresume
subcommands. I think for setting suspend, the CLI should use only the new object annotation. We would like to gradually phase out the usage of.spec.suspend
. If no reason flag/argument is provided, it can set the value of the annotation to"true"
. For resuming, the CLI can check in both.spec.suspend
and the annotation for suspend key and remove them. And for theget
subcommand, the CLI can take into consideration both the.spec.suspend
and the annotation to compute the suspension state of the object. If any of the two methods enable suspend, the result should be SUSPENDED=True in the CLI table output. - In the proposal section below, it mentions "Register a resource object status for the suspended state". Reading some other parts of the document, I get the sense that this is referring to the
.status
field of the object. Is that right? We generally use the term "object status" to refer to.status
. If that's what this is referring to, as hidde pointed out in another comment, we don't need to reflect the effective status/value of suspension in the object status. Just the presence of suspended annotation or.spec.suspend
being true is enough to indicate that the object is suspended. Any reader of the API should look for these two fields. We can provide some helper function to make it easy to take into account both of these values and return an effective suspension value, also considering true/false, on/off, etc kind of values to make it more flexible. Also, based on some previous discussions, my understanding is that we don't want to reconcile at all when the object is marked as suspended. And this could be done using the predicates in controller-runtime, refer to https://github.com/kubernetes-sigs/controller-runtime/blob/v0.16.2/pkg/predicate/doc.go in case you're unfamiliar with it. We want to filter out object events if they are suspended and not reconcile at all. An example of such predicates in flux is https://github.com/fluxcd/source-controller/blob/v1.1.1/internal/controller/source_predicate.go. But since this will be a common predicate for all of the flux controllers, it'd be good to add it in https://github.com/fluxcd/pkg/tree/main/runtime/predicates where we already have a predicate for reconcile requested annotation, which is similar to this annotation we are about to add. - Regarding suspend metrics, in the last release of flux, v2.1.0, we deprecated some metrics that are specific to the object state, refer https://fluxcd.io/flux/monitoring/metrics/#-deprecated-resource-metrics. Suspend metrics is one of them. Users can now set up their own custom metrics for any fields in the API objects that are relevant for them. You can read more about it in the custom metrics docs. In the new flux metrics examples we provide, we have some default metrics labels which includes
.spec.suspend
value forsuspended
metric label, see https://github.com/fluxcd/flux2-monitoring-example/blob/21c9646949d7ec0c4352353b06771b18152f38e3/monitoring/controllers/kube-prometheus-stack/kube-state-metrics-config.yaml#L49. For annotation based suspend value, we can add another field, something like
suspendedAnnotation: [ metadata, annotations, reconcile.fluxcd.io/suspended ]
that'll extract the suspended annotation value. And then in metrics dashboard, one can make some query like
gotk_resource_info{suspendedAnnotation!=""} or gotk_resource_info{suspended="true"}
to get all the suspended objects. The query can be customized further to also take into account other values for the annotation.
Hope these are some useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that scheduled suspension should be out of scope here because it opens up a whole new problem space and things to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for the feedback. I'm going to make some edits to focus this RFC on the shift away from .spec.suspend
and supporting an optional message/reason re: #4232.
6fe7ca5
to
507cd78
Compare
resources. Signed-off-by: Travis Mattera <[email protected]>
Added discussion and implementation samples for update predicate. Altered plans for handling suspended objects only by annotation and omitting any status updates or .spec.suspend manipulation. Signed-off-by: Travis Mattera <[email protected]>
Signed-off-by: Travis Mattera <[email protected]>
e65c427
to
cb4db08
Compare
Added implementation details regarding controller predicates. Rephrased and wordsmithed content. Signed-off-by: Travis Mattera <[email protected]>
using cli. Signed-off-by: Travis Mattera <[email protected]>
Signed-off-by: Travis Mattera <[email protected]>
Added rfc-0006 for a proposal on an alternative suspend control for flux resources.