From ec9e3474900a83e6230f85007700499821ea40af Mon Sep 17 00:00:00 2001 From: Matheus Pimenta Date: Thu, 23 May 2024 15:14:27 +0100 Subject: [PATCH] Add RFC 0008 - Custom Event Metadata from Annotations Signed-off-by: Matheus Pimenta --- .../README.md | 246 ++++++++++++++++++ 1 file changed, 246 insertions(+) create mode 100644 rfcs/0008-custom-event-metadata-from-annotations/README.md diff --git a/rfcs/0008-custom-event-metadata-from-annotations/README.md b/rfcs/0008-custom-event-metadata-from-annotations/README.md new file mode 100644 index 0000000000..c7f3b364bd --- /dev/null +++ b/rfcs/0008-custom-event-metadata-from-annotations/README.md @@ -0,0 +1,246 @@ +# RFC-0008 Custom Event Metadata from Annotations + +**Status:** implementable + + + +**Creation date:** 2024-05-23 + +**Last update:** 2024-12-17 + +## Summary + +Flux users often run into situations where they wish to send custom, static metadata fields defined +in Flux objects on the events dispatched by the respective Flux controller to Kubernetes and +notification-controller. This proposal offers a solution for supporting those use cases uniformly +across all Flux controllers by sending the annotation keys in Flux objects that are prefixed with +the API Group `event.toolkit.fluxcd.io` followed by a slash, i.e. `event.toolkit.fluxcd.io/`. + +## Motivation + +This RFC comes as a response to the need for adding custom metadata to events about Flux objects +sent to notification providers. See specific user stories in the [User Stories](#user-stories) section. + +### Goals + +Provide a method for Flux users to embed custom/static metadata in their Flux objects +and have that metadata propagated to the notification providers. + +### Non-Goals + +In this proposal we **do not** aim to provide a method for Flux users to send etcd-indexed custom metadata +fields from Flux objects in events to notification-controller, most specifically labels. By design an event +already contains enough identification information to locate the associated Flux object inside the cluster, +which covers the use case of labels. Flux does not wish to incentivize practices that are impactful to clusters +without a strong reason or benefit. + +## Proposal + +When sending events about Flux objects, we propose sending annotation keys prefixed with the well-defined +API Group `event.toolkit.fluxcd.io` followed by a slash, i.e. prefixed with `event.toolkit.fluxcd.io/`, in +addition to all the metadata that is already sent in the event. + +### User Stories + +#### Story 1 + +> As a user, I want to embed Flux into my GitHub Workflow in a way that it only succeeds if +> the deployment made by Flux is successful. + +For example, embedding a Deployment ID from the GitHub API in a `HelmRelease` object like the one below: + +```yaml +apiVersion: helm.toolkit.fluxcd.io/v2 +kind: HelmRelease +metadata: + name: podinfo + namespace: flux-system + annotations: + event.toolkit.fluxcd.io/deploymentID: e076e315-5a48-41c3-81c8-8d8bdee7d74d +spec: + chart: + spec: + chart: podinfo + version: 6.5.* + sourceRef: + kind: HelmRepository + name: podinfo +``` + +Should cause notification-controller to propagate an event like the one below (most fields omitted for brevity): + +```json +{ + "involvedObject": { + "apiVersion": "helm.toolkit.fluxcd.io/v2", + "kind": "HelmRelease", + "name": "podinfo", + "namespace": "flux-system", + "uid": "7d0cdc51-ddcf-4743-b223-83ca5c699632" + }, + "metadata": { + "deploymentID": "e076e315-5a48-41c3-81c8-8d8bdee7d74d" + } +} +``` + +#### Story 2 + +> As a user, I want to embed the new image tag in a `HelmRelease` object when the image is updated by an `ImageUpdateAutomation` +> and have that information propagated to the notification providers. + +For example: + +```yaml +apiVersion: helm.toolkit.fluxcd.io/v2 +kind: HelmRelease +metadata: + name: podinfo + namespace: flux-system + annotations: + event.toolkit.fluxcd.io/image: ghcr.io/stefanprodan/podinfo:latest # {"$imagepolicy": "flux-system:podinfo"} +spec: + chart: + spec: + chart: podinfo + sourceRef: + kind: HelmRepository + name: podinfo + values: + image: + tag: latest # {"$imagepolicy": "flux-system:podinfo:tag"} +``` + +In this example image-automation-controller would update the image and tag near the markers. If, for example, it +updates the image to `ghcr.io/stefanprodan/podinfo:6.5.0`, then it would cause notification-controller to start +propagating events like the one below (most fields omitted for brevity): + +```json +{ + "involvedObject": { + "apiVersion": "helm.toolkit.fluxcd.io/v2", + "kind": "HelmRelease", + "name": "podinfo", + "namespace": "flux-system", + "uid": "7d0cdc51-ddcf-4743-b223-83ca5c699632" + }, + "metadata": { + "image": "ghcr.io/stefanprodan/podinfo:6.5.0" + } +} +``` + +### Alternatives + +#### Alternative 1 + +An alternative for specifying custom metadata fields in Flux objects for sending on events +is defining `.spec` APIs for such, like `.spec.eventMetadata` available in the Alert API. +This alternative is not great because: + +* Such APIs would be fairly redundant with the well-known Kubernetes annotations. +* Technically speaking, it is much easier to implement an alternative where the +field storing the custom metadata is the same and is already available across all the +Flux objects rather than introducing a new API. + +In the specific case of the Alert API this field was introduced because the Alert API +is obviously a special one in the context of events and alerting. In particular, the +Alert objects do not generate events themselves, but rather serve as an aggregation +configuration for matching and propagating events from other Flux objects. + +#### Alternative 2 + +Instead of introducing a new API Group, i.e. `event.toolkit.fluxcd.io`, we could use the API +Group `notification.toolkit.fluxcd.io` for the same purpose. This alternative is not great +because it emphasizes an exclusive relationship with notification-controller, which is not the +case. The events here are also Kubernetes Events, and an API Group that is more general and +closer to Kubernetes Events is more appropriate. + +## Design Details + +All the Flux controllers use our implementation of the `EventRecorder` interface from the Go +package `k8s.io/client-go/tools/record`: [`(*github.com/fluxcd/pkg/runtime/event.Recorder).AnnotatedEventf()`](https://github.com/fluxcd/pkg/blob/6f2619522699f1a78e8c7b41583ad9f7b7c9544e/runtime/events/recorder.go#L119). +This implementation sends the events to notification-controller and also calls the same method +from an injected `EventRecorder`, which sends the events to Kubernetes. To support the use cases +discussed here we would modify this implementation to look for annotations prefixed with +`event.toolkit.fluxcd.io/` in the Flux objects and send them alongside the other metadata of +the event. Here we are talking specifically about the object annotations retrieved from the +Flux object itself, i.e. the first argument of the `AnnotatedEventf()` method: `object runtime.Object`. +This implementation would not change the interface `EventRecorder` used by the controllers, +so all we need to do is bump the Go package `github.com/fluxcd/pkg/runtime` across all controllers. + +On the notification-controller side we would start accepting metadata keys starting with this +prefix and remove it before sending the metadata key-value pair to the notification providers. +This is an important aspect of the implementation because notification-controller only +accepts metadata keys that are prefixed with the Group of the respective API the involved +Flux object belongs to, so we need to add an exception for the new prefix. + +The API Group `event.toolkit.fluxcd.io` would be introduced as a constant in the package +`github.com/fluxcd/pkg/apis/event` with the name `Group`. This constant would used in the package +`github.com/fluxcd/pkg/runtime/event` and notification-controller for the implementation described above. + +### Precedence Order + +After this change, there would be four sources of metadata being sent on notifications. +They are listed below with the proposed order of precedence, from lowest to highest: + +1. User-defined summary on the Alert object, set with `.spec.summary`. +2. User-defined metadata on Flux objects, set with the `event.toolkit.fluxcd.io/` prefix in the keys of the object's `.metadata.annotations`. +3. User-defined metadata on the Alert object, set with `.spec.eventMetadata`. +4. Controller-defined metadata, set with the `.toolkit.fluxcd.io/` prefix in the metadata keys of the event payload. + +Upon any key conflicts when combining all the metadata above, notification-controller would +resolve them according to the precedence order specified above, print an `info` log and emit a +Kubernetes Event containing all the key conflicts to warn the user and prompt them to change +their configuration to remove those conflicts. + +#### Reasoning + +Controller-defined metadata has the highest precedence because it integrates with external systems, +e.g. commit SHAs, digests, chart versions, etc. + +Alert-level metadata (`.spec.eventMetadata`) is usually also cluster-level, e.g. the cluster name, +region, environment, etc. We don't want tenants overriding cluster-level metadata. + +User-defined metadata on Flux objects, whose use cases are described in the [User Stories](#user-stories) +section, would usually be defined by cluster tenants. Hence it should not override cluster-level metadata. + +And last comes `.spec.summary` from the Alert API. For now it can serve as a default summary for alerts +in the cluster, e.g + +`Apps are failing in the cluster. Playbooks: ()` + +An object-level override could be: + +`App X is failing. Playbook: ()` + +The `.spec.summary` field from the Alert API was the first to support user-defined metadata. +Later `.spec.eventMetadata` was introduced to support structured user-defined metadata. The +Flux team considers the `.spec.eventMetadata` a generalization of `.spec.summary`, and the +`.spec.summary` field may be deprecated in the future. + +### How can this feature be enabled / disabled? + +To enable the feature, use the `event.toolkit.fluxcd.io/` prefix in Flux object annotations, +for example: + +* `event.toolkit.fluxcd.io/image: ghcr.io/stefanprodan/podinfo` +* `event.toolkit.fluxcd.io/deploymentID: e076e315-5a48-41c3-81c8-8d8bdee7d74d` + +It's important to notice that not all Flux objects emit events, e.g. Alert and Provider objects. +For a list of the Flux objects that emit events, see the kinds allowed on the +`.spec.eventSources[].kind` field of the Alert API. + +To disable the feature, do not use `event.toolkit.fluxcd.io/` as a prefix in Flux object annotations. + +## Implementation History + +