Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm release installation fails with etcdserver leader changed #4804

Closed
1 task done
DeyvsonL opened this issue May 20, 2024 · 4 comments · Fixed by fluxcd/helm-controller#1084
Closed
1 task done

Comments

@DeyvsonL
Copy link

DeyvsonL commented May 20, 2024

Describe the bug

When my cluster is starting and installing some helm releases (multiples at the same time), we are frequently (20% of the time) getting some Helm Releases failing with the error: Helm install failed for release "chart-name" with chart "char-name@version": etcdserver: leader changed.

From what I can verify, helm had this issue in the past. helm/helm#11426

Same behavior applied to other errors raised by etcd, such as "etcdserver: request timed out".

In both cases, the helm installation continues in background without issues and the app with the helm release failing is installed successfully, but other apps that depends on the helm release will not start as they think the previous helm release failed.

Steps to reproduce

Create new cluster.
Set cluster configuration to install Flux on cluster bootstrap and point to existing Git repository.
Wait all helm releases in the repository be installed.

Sometimes the steps above will make some Helm Release fail with the error "etcdserver: leader changed".

Expected behavior

When helm face a "etcdserver: leader changed", the helm release should still retry the installation as doesn't impact helm installation and was already solved on helm main repository.

Screenshots and recordings

image

OS / Distro

Ubuntu 22.04

Flux version

v2.3.0

Flux check

► checking prerequisites
W0520 15:17:54.120603 41669 warnings.go:70] Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.
✔ Kubernetes 1.28.8+rke2r1 >=1.28.0-0
► checking version in cluster
✔ distribution: flux-v2.3.0
✔ bootstrapped: true
► checking controllers
✔ helm-controller: deployment ready
► fluxcd/helm-controller:v1.0.1
✔ image-automation-controller: deployment ready
► fluxcd/image-automation-controller:v0.38.0
✔ image-reflector-controller: deployment ready
► fluxcd/image-reflector-controller:v0.32.0
✔ kustomize-controller: deployment ready
► fluxcd/kustomize-controller:v1.3.0
✔ notification-controller: deployment ready
► fluxcd/notification-controller:v1.3.0
✔ source-controller: deployment ready
► fluxcd/source-controller:v1.3.0
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta3
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1
✔ helmreleases.helm.toolkit.fluxcd.io/v2
✔ helmrepositories.source.toolkit.fluxcd.io/v1
✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2
✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta3
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ all checks passed

Git provider

Azure DevOps

Container Registry provider

Azure container registry

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@souleb
Copy link
Member

souleb commented May 21, 2024

we could include helm/helm#11426 into helm-controller.

The change should be done in https://github.com/fluxcd/helm-controller/blob/f731a805b1485f622ff08a63bb6558ba08296600/internal/kube/client.go#L129.

Are you willing to contribute this change @DeyvsonL ?

@Valgueiro
Copy link

I can give it a try!

@luisdavim
Copy link

Hi, this issue was affecting me so instead of waiting for helm/helm#13052 , I've opened fluxcd/helm-controller#1084

I hope that's ok, I've copied the round tripper file from the helm git repo and added a TODO comment referencing the issue.
Anoter reason for taking this approach is that in my experience it takes a long time to get PRs reviewed and merged in helm...

@souleb I've made the code change where you pointed, could you have a look?

Thanks.

@luisdavim
Copy link

Also opened helm/helm#13383 when/if that gets merged I can open a PR to remove the copy of the rountripper and import the code from helm directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants