Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOP-86] Add status & events for Helm Chart Addons; Improve Status for Manifests #18

Merged
merged 7 commits into from
Dec 8, 2023

Conversation

tppolkow
Copy link
Contributor

@tppolkow tppolkow commented Nov 29, 2023

Description/Summary

Addresses https://mirantis.jira.com/browse/BOP-86

This PR adds status and events to Helm Chart Addons. The status changes are done by telling the addon controller to watch (https://book.kubebuilder.io/reference/watching-resources/operator-managed) Jobs which are created by the helm controller for each HelmChart custom resource. This means Addons will display status of install helm chart job. The limitation here is the job succeeds once all the resources specified in the helm chart are created (i.e deployments), but if there is a failure in creation of pods by that deployment it will not bubble up. We could look into doing something more here similar to manifests below but it will not be as simple as manifests since we don't know beforehand what, if any, pods are created by the helm chart.

Additionally this PR reworks how I previously implemented some of the events / status for manifest addons. I think the previous approach was brittle and not as robust. It updated the status of the manifest after certain events but then didn't update it past a further point. I kept some of the status/events from previous PR (such as when the addon fails in an early stage of setting up the CR), but additionally have also set up the manifest controller to watch Deployment and Daemonset resources and update status off of those. These resources were chosen because they have some status fields we can update off of, and are the main resources that deploy pods that we currently support in Manifest. I think we will definitely need to update this once we support some more complex manifests but I think it is a good starting point for manifest status.

Testing

Test 1 : Happy Path

Using below addons:

    addons:
      - name: calico
        kind: manifest
        enabled: true
        manifest:
          url: https://raw.githubusercontent.com/projectcalico/calico/v3.26.3/manifests/calico.yaml
      - name: metallb
        kind: manifest
        enabled: true
        manifest:
          url: https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/admin/namespace-dev.yaml
      - name: my-grafana
        enabled: true
        kind: chart
        namespace: monitoring
        chart:
          name: grafana
          repo: https://grafana.github.io/helm-charts
          version: 6.58.7
          values: |
            ingress:
              enabled: true

After deploying above blueprint we can watch the status of the addons:

tpolkowski@tpolkowski-MBP16-1947 boundless-cli % k get addon -n boundless-system -w
NAME     STATUS
calico
metallb
calico
metallb
my-grafana
calico       Progressing
metallb      Progressing
my-grafana
calico       Progressing
my-grafana   Progressing
calico       Available
metallb      Available
calico       Progressing
my-grafana   Available
calico       Progressing
calico       Available

Addons start with Progressing Status. For Helm chart addons, the status gets updated as the associated Job gets progressed. Once the Job finishes, the status is set to Availablre.

For Manifest addons we see metallb is Available almost instantly. This is because the manifest is just a namespace. On the other hand calico manifest is in Progressing for several minutes - until the deployments and daemonsets in the manifest successfully create pods.

Eventually the addons become Available

tpolkowski@tpolkowski-MBP16-1947 boundless-cli % k get addon -n boundless-system
NAME         STATUS
calico       Available
metallb      Available
my-grafana   Available

If we describe any addon we can see a more detailed status and some events that have been emitted. i.e for my-grafana:

Status:
  Last Transition Time:  2023-12-05T03:31:52Z
  Reason:                Helm Chart helm-install-grafana successfully installed
  Type:                  Available
Events:
  Type    Reason            Age                    From              Message
  ----    ------            ----                   ----              -------
  Normal  SuccessfulCreate  5m46s (x2 over 5m46s)  addon controller  Created Chart Addon monitoring/my-grafana

Test 2 : Deploy manifest that fails to install

tpolkowski@tpolkowski-MBP16-1947 boundless-cli % k get addon -A
NAMESPACE          NAME                STATUS
boundless-system   blackbox-exporter   Unhealthy
boundless-system   calico              Available
tpolkowski@tpolkowski-MBP16-1947 boundless-cli % k get manifest -A
NAMESPACE          NAME                STATUS
boundless-system   blackbox-exporter   Unhealthy
boundless-system   calico              Available

And we can get details by describing the manifest

k describe manifest blackbox-exporter -n boundless-system

Status:
  Last Transition Time:  2023-12-05T22:04:20Z
  Message:               failed to update manifest  : yaml: line 199: mapping values are not allowed in this context
  Reason:                failed to update manifest
  Type:                  Unhealthy
Events:
  Type     Reason        Age                      From                 Message
  ----     ------        ----                     ----                 -------
  Warning  FailedCreate  8m15s                    manifest controller  failed to create objects for the manifest boundless-system/blackbox-exporter : yaml: line 199: mapping values are not allowed in this context
  Warning  FailedCreate  8m13s (x3 over 8m14s)    manifest controller  failed to update manifest crd while update operation boundless-system/blackbox-exporter : Operation cannot be fulfilled on manifests.boundless.mirantis.com "blackbox-exporter": the object has been modified; please apply your changes to the latest version and try again
  Warning  FailedCreate  8m11s (x3 over 8m16s)    manifest controller  failed to update manifest object with finalizer boundless-system/blackbox-exporter
  Warning  FailedCreate  2m20s (x231 over 8m15s)  manifest controller  failed to update manifest boundless-system/blackbox-exporter : yaml: line 199: mapping values are not allowed in this context

which are also bubbled up to the addon:

Status:
  Last Transition Time:  2023-12-05T22:04:08Z
  Message:               failed to update manifest  : yaml: line 199: mapping values are not allowed in this context
  Reason:                failed to update manifest
  Type:                  Unhealthy
Events:
  Type     Reason        Age                    From              Message
  ----     ------        ----                   ----              -------
  Warning  FailedCreate  8m45s (x25 over 9m3s)  addon controller  Failed to Create Manifest Addon default/blackbox-exporter : Operation cannot be fulfilled on manifests.boundless.mirantis.com "blackbox-exporter": the object has been modified; please apply your changes to the latest version and try again

@tppolkow tppolkow force-pushed the BOP-86-helmchart branch 2 times, most recently from 4d1688b to 4972eac Compare December 1, 2023 23:04
@tppolkow tppolkow changed the title WIP [BOP-86] Add status & events for Helm Chart Addons; Improve Status for Manifests Dec 5, 2023
@tppolkow tppolkow marked this pull request as ready for review December 5, 2023 22:12
- list
- patch
- update
- watch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the operator need permissions for all of these? If we only use watch/get/list etc, then we should only ask for those permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I copied the setup from kubebuilder but I think those extra ones were only needed in their specific example. I updated to only use watch / get / list

return ctrl.Result{}, err
}

result, err := r.setOwnerReferenceOnManifest(ctx, logger, instance, m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I am wondering how is calling setControllerReference() different from setting the owns relationship with SetupWithManager:

return ctrl.NewControllerManagedBy(mgr).
  For(&boundlessv1alpha1.Addon{}).
  Owns(&boundlessv1alpha1.Manifest{}).
  Watches(
	  &source.Kind{Type: &batch.Job{}},
	  handler.EnqueueRequestsFromMapFunc(r.findAddonForJob),
	  builder.WithPredicates(predicate.ResourceVersionChangedPredicate{}),
  ).
  Complete(r)

Also, if we set the Owns() reference in SetupWithManager(), why do we still need to set the controller reference for each instance of the CR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setControllerReference sets the below in the metadata of the manifest :

  "ownerReferences": [
            {
                "apiVersion": "boundless.mirantis.com/v1alpha1",
                "blockOwnerDeletion": true,
                "controller": true,
                "kind": "Addon",
                "name": "calico",
                "uid": "7d7a3f66-46c1-4a46-a3be-866d105d480e"
            }
        ]

Then Owns(&boundlessv1alpha1.Manifest{}). tells the manager to basically Watch each instance of boundlessv1alpha1.Manifest and issue a reconcile against the resource in the ownerRef if one is present.

Without the setControllerReference then no reconcile is issued since there is no object in the ownerRef

controllers/addon_controller.go Outdated Show resolved Hide resolved
controllers/addon_controller.go Outdated Show resolved Hide resolved
controllers/addon_controller.go Show resolved Hide resolved
Watches(
&source.Kind{Type: &batch.Job{}},
handler.EnqueueRequestsFromMapFunc(r.findAddonForJob),
builder.WithPredicates(predicate.ResourceVersionChangedPredicate{}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here explaining what Job it will watch? Does it watches for all jobs in the namespace? Or a subset of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It watches all jobs in the cluster, and then runs the MapFunc we provide (findAddonForJob) on each job. The map function returns a list of reconcile requests (if any) that the manager should send.

In our case - the manager will watch all Jobs in the cluster. Then for each job it attempts to find an addon that has an index value of jobNamespace-jobName. If it finds one it reconciles that addon, if not then it does nothing. So while it watches all jobs in the cluster , only the jobs that have the same namespace and same name as one specified in an addon spec will trigger a reconcile.

Will add similar comment in the code.

// updateHelmchartAddonStatus checks the status of the associated helm chart job and updates the status of the Addon CR accordingly
func (r *AddonReconciler) updateHelmchartAddonStatus(ctx context.Context, logger logr.Logger, namespacedName types.NamespacedName, job *batch.Job, addon *boundlessv1alpha1.Addon) error {
logger.Info("Updating Helm Chart Addon Status")
if job.Status.CompletionTime != nil && job.Status.Succeeded > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor: No change needed at the moment. But this if/else can be simplified by labeling these conditions. for example:

type JobStatus int
const (
   JobSatusSuccess int = iota
   JobStatusFailed
  JobStatusProgressing
)

func getJobStatusCondition(status Status) JobStatus {
    if job.Status.CompletionTime != nil && job.Status.Succeeded > 0 {
       return JobSatusSuccess
    }
    ...
}

controllers/addon_controller.go Show resolved Hide resolved
// If no errors are found, we check if any deployments/daemonsets are still progressing and set the manifest status to Progressing
// Otherwise set the manifest status to Available
// This is not comprehensive and may need to be updated as we support more complex manifests
func (r *ManifestReconciler) checkManifestStatus(ctx context.Context, logger logr.Logger, namespacedName types.NamespacedName, objects []boundlessv1alpha1.ManifestObject) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor suggestion: This function can be broken down into and we can add unit tests for it. Currently it is hard to read:

Suggestion:

  1. Filter for Deployment and DaemonSets to remove switch statement
  2. Use label (for condition of manifest status) to make it easier to parse
  3. Unit tests

This does not need to be done with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can break it down and make it easy to also add support for other resources in the future.

@tppolkow tppolkow merged commit 3c32600 into Mirantis:main Dec 8, 2023
4 checks passed
@tppolkow tppolkow deleted the BOP-86-helmchart branch December 8, 2023 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants