Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HelmRelease deletion not waiting for helm uninstall #1021

Closed
1 task done
cwrau opened this issue Jul 10, 2024 · 7 comments · Fixed by #1024
Closed
1 task done

HelmRelease deletion not waiting for helm uninstall #1021

cwrau opened this issue Jul 10, 2024 · 7 comments · Fixed by #1024

Comments

@cwrau
Copy link
Contributor

cwrau commented Jul 10, 2024

Describe the bug

When you delete a HelmRelease, even with uninstall.deletionPropagation=foreground, flux, via helm or otherwise, marks all created resources to be deleted and then removes the finalizer, resulting in the HelmRelease disappearing even though the uninstallation hasn't finished.

This, at least sometimes, leaves the "real" helm release to be stuck in uninstalling which has to be manually resolved.

Steps to reproduce

Create a helm chart like https://github.com/teutonet/teutonet-helm-charts/tree/main/charts/t8s-cluster, which creates CRs that are not just "no-ops" for deletion, like cluster-api resources.

Install it, wait for everything to be ready, uninstall.

Expected behavior

That flux leaves the finalizer on the HelmRelease until the "real" helm release is really uninstalled.

I suppose what I mean in a non-flux way would be for flux to wait until helm uninstall finishes successfully.

Screenshots and recordings

No response

OS / Distro

N/A

Flux version

N/A

Flux check

► checking prerequisites
✗ Kubernetes version v1.27.14 does not match >=1.28.0-0
► checking version in cluster
✔ distribution: flux-2.3.0
✔ bootstrapped: true
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v1.0.1
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.38.0
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.32.0
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.3.0
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.3.0
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.3.0
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta3
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1
✔ helmreleases.helm.toolkit.fluxcd.io/v2
✔ helmrepositories.source.toolkit.fluxcd.io/v1
✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2
✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta3
✔ receivers.notification.toolkit.fluxcd.io/v1
✗ check failed

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

// "moved" from fluxcd/flux2#4872

@cwrau
Copy link
Contributor Author

cwrau commented Jul 10, 2024

I've tried debugging this, and this line is the root cause.

Because the digest seems to change during the uninstall, which in fact returns an error, that error is not returned, helm-controller thinks it finished without an error and removes the finalizer, therefore removing the HelmRelease.

Is there a reason for the digest check? I would just open a PR removing that, if that's ok?

@stefanprodan
Copy link
Member

Because the digest seems to change during the uninstall

Why would it change? Are you using GitRepo as the source of the HelmRelease?

@cwrau
Copy link
Contributor Author

cwrau commented Jul 11, 2024

Because the digest seems to change during the uninstall

Why would it change?

I don't know, I debugged helm-controller and the .Digest of req.Object.Status.History.Latest() between

cur = req.Object.Status.History.Latest().DeepCopy()
and
if req.Object.Status.History.Latest().Digest == cur.Digest {
is different.

I tried looking through the code how the req.Object could get changed, but I couldn't figure it out😅

Are you using GitRepo as the source of the HelmRelease?

Nope, using HelmRepository, no update occured during these 60 lines 😅

@cwrau
Copy link
Contributor Author

cwrau commented Jul 15, 2024

@stefanprodan, ah, I think I've found it;

During helm uninstall, after changing info.Status, helm updates the release

After the update, flux notifies itself on the HelmRelease object about the history "change"

During the observation, ObservedToSnapshot calculates a new Digest with the updated info.Status, which of course changes the digest

I hope I didn't miss anything 😅

@cwrau
Copy link
Contributor Author

cwrau commented Jul 26, 2024

Is there any news on this? Maybe my PR, #1024, is even acceptable?

Would be great to have this fixed 😊

cwrau added a commit to cwrau/helm-controller that referenced this issue Aug 7, 2024
@cwrau
Copy link
Contributor Author

cwrau commented Aug 9, 2024

Ola, is there something I can do to help get this resolved? Do you need more information?

@stefanprodan
Copy link
Member

stefanprodan commented Sep 6, 2024

@cwrau we've talked about removing the check in a dev meeting and the conclusion was that it's safe to do so. For the PR to be merged you'll need to adapt the unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants