-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make install and upgrade retries number configurable #394
Conversation
Hey @ubergesundheit, a test pull request has been created for you in the cluster-aws repo! Go to pull request giantswarm/cluster-aws#941 in order to test your cluster chart changes on AWS. |
Looks like tests seem to work. China and cilium ENI mode tests are failing but to some unrelated reasons (See here) |
There were differences in the rendered Helm template, please check! Output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comment here: giantswarm/roadmap#3664 (comment)
In general I agree with Marcus, but on the other hand this feels like setting timeouts in network apps: Of course you can always just set huge timeouts, but this sometimes hides root causes you want to fix. So I'm not sure if we should rather pick a high value or use Whichever way we go, having it configurable makes sense to me as you we might end up in an incident where it would be useful to not have it hardcoded one day. |
We should still have alerting in place for when these are stuck pending for too long. That doesn't change. But all default apps (what we're talking about here) need to install for a WC to be considered successful. So there's no reason not to keep trying from what I see.
I don't think it adds anything to introduce more complexity "just in case". When we have an actual need, sure, but I don't see that we actually need to configure it right now. We just need it not to timeout. |
What does this PR do?
This PR enables users of the
cluster
chart to configure the install and upgrade retries of HelmReleases. During cluster upgrades, HelmRelease upgrade tries are exhausted because of non-ready nodes.What is the effect of this change to users?
All components using HelmReleases can have different install and upgrade retries
Any background context you can provide?
giantswarm/roadmap#3664
Should this change be mentioned in the release notes?