feat: allow pod replacement policy override #328

buroa · 2024-08-27T19:56:09Z

This is causing jobs to fail again and again if the pod is terminating (node is draining, etc). This ensures that the job is only recreated if the job/pod actually failed.

https://kubernetes.io/blog/2023/08/21/kubernetes-1-28-jobapi-update/

buroa · 2024-08-27T19:59:26Z

Hey @brandond, I currently maintain a fork of this change that a lot of the community uses. The current one does not work for people running K8s > 1.28. Let me know if you need anything else here.

brandond · 2024-08-27T21:39:12Z

This is the first I've heard of a problem with this, and we deploy SUC for upgrades as part of Rancher. We also test it regularly for automated upgrades of standalone RKE2 and K3s clusters.

Can you point me at any issues reporting this problem and describing how to reproduce it?

buroa · 2024-08-28T10:32:53Z

Can you point me at any issues reporting this problem and describing how to reproduce it?

I can have the community weigh in here as I'll put this thread inside our Discord. We are using Talos and wanted to have rolling upgrades much like k3s.

Here is our job spec; talosctl takes care of the drain.

Essentially what talosctl does here is drains the workloads and then applies the new kernel and reboots. Once this takes place, the job will see the pod status as terminating and recreate the pod. So per K8s 1.28, this will cause the job to be recreated again.

So the job will be recreated, which will cause a reboot again and again. It never gets a success and system-upgrade-controller can't continue on with another node.

You can recreate this by deploying a Talos cluster and using the system-upgrade-controller to do a upgrade.

@onedr0p @bjw-s @jfroy @JJGadgets

brandond · 2024-08-28T17:09:29Z

Once this takes place, the job will see the pod status as terminating and recreate the pod. So per K8s 1.28, this will cause the job to be recreated again.

This sounds like desired behavior. The job image is supposed to be idempotent, and exit successfully if the upgrade is either completed successfully, or has already been applied. There is no guarantee that the job will only be run once; it may be re-run several times before exiting successfully. You'll see this with the OS upgrade examples, where the nodes are rebooted if necessary. This will cause the pod to terminate early, and run again following the reboot - however on a second pass they will no longer be rebooted, and exit cleanly.
https://github.com/rancher/system-upgrade-controller/blob/master/examples/openSUSE/microos.yaml

Why is your job image exiting with a non-zero exit code following a successful upgrade, or if the upgrade is no longer necessary?

buroa · 2024-08-28T18:16:49Z

Once this takes place, the job will see the pod status as terminating and recreate the pod. So per K8s 1.28, this will cause the job to be recreated again.

This sounds like desired behavior. The job image is supposed to be idempotent, and exit successfully if the upgrade is either completed successfully, or has already been applied. There is no guarantee that the job will only be run once; it may be re-run several times before exiting successfully. You'll see this with the OS upgrade examples, where the nodes are rebooted if necessary. This will cause the pod to terminate early, and run again following the reboot - however on a second pass they will no longer be rebooted, and exit cleanly. master/examples/openSUSE/microos.yaml

Why is your job image exiting with a non-zero exit code following a successful upgrade, or if the upgrade is no longer necessary?

I'm not sure if you are following. The command does exit 0, but talosctl is draining the node so the pod gets evicted during the drain. Since the status was terminating, the job spec will recreate it.

This isn't a hard change. If you are opposed to it, I will maintain my fork.

brandond · 2024-08-28T18:30:58Z

I'm not sure if you are following. The command does exit 0.

That's not what you said, you indicated that the job was failing: This is causing jobs to fail again and again. What you're describing is not failure, its just that the job pod is getting recreated - which you seem to be rounding up to a failure state.

talosctl is draining the node so the pod gets evicted during the drain. Since the status was terminating, the job spec will recreate it. So the job will be recreated, which will cause a reboot again and again.

As I said, this is a problem with your job image. It should NOT drain the node if it does not need to upgrade the kernel and reboot.

This isn't a hard change. If you are opposed to it, I will maintain my fork.

I don't understand why you would want to maintain a fork instead of just making your job pod observe the correct idempotent behavior.

buroa · 2024-08-28T18:43:29Z

It seems like you don't want this change and I'm certainly not going to "sell" you the idea when you are clearly against it in the first place.

brandond · 2024-08-28T18:58:05Z

I'm not opposed to it per se, I'm just trying to understand if there's something this fixes other than improving handling of jobs that are not idempotent. This is a basic requirement that is literally called out in the readme:

https://github.com/rancher/system-upgrade-controller/blob/master/README.md#considerations

Additionally, one should take care when defining upgrades by ensuring that such are idempotent--there be dragons.

Is the whole point of your fork just to no longer require idempotent jobs? What else are you modifying to accommodate that goal?

kashalls · 2024-08-28T22:36:53Z

I'm a little less technical on this subject, but going to leave my comment in support.

A couple (about 62 of us according to kubesearch) of us are relying on buroa's fork as the rancher upstream doesn't work for Talos-based environments. (I would very much like to rely on upstream instead of a fork) Essentially with the talosctl image, the plan would be running while talos is draining the node and upgrading then preforming the upgrade. If we used rancher upstream, the upgrade would continuously be created until it reaches the backoffLimit. This just changes the spec to wait until the pod has actually failed and is not terminating because it has a deletionTimestamp.

His fork adds a lot of automation for the updating and deployment of the fork, other than that the changes he proposes here are the only changes that are needed to bring support to us.

Maybe the right value here to default to is TerminatingOrFailed (default in kubernetes) instead of just Failed (what talos needs) according to the Job spec?
https://github.com/kubernetes/website/blob/dbfbc78c7eb4970fc003a932fcc79eb4a9f7155e/content/en/docs/reference/kubernetes-api/workload-resources/job-v1.md?plain=1#L246 as that was the default value.

brandond · 2024-08-28T23:00:22Z

Can someone point me at the fork in question? I am curious if the talos upgrade job would work as-is without forking if the talosctl upgrade image followed best practices around idempotency instead of unconditionally draining and rebooting the node.

kashalls · 2024-08-28T23:03:37Z

Can someone point me at the fork in question? [...]

https://github.com/buroa/system-upgrade-controller/

jfroy · 2024-08-29T05:54:53Z

talosctl upgrade itself sends an RPC to Talos's machined which then unconditionally runs an upgrade sequence. So Talos does violate the assumptions or requirements SUC places on upgrade plans. It may be possible to fix this in Talos and we can take it up with them. It may also be reasonable to consider bending SUC with something like this PR to accommodate systems that don't track previously-applied upgrade requests/sequences/commands.

brandond · 2024-08-29T15:53:57Z

@buroa I would be open to this if the default was the current Kubernetes default of TerminatingOrFailed, as proposed by @kashalls

Other than dependency bumps and example jobs, I'm not seeing anything else that you've changed in your fork. Is that accurate?

buroa · 2024-08-29T15:55:25Z

@buroa I would be open to this if the default was the current Kubernetes default of TerminatingOrFailed, as proposed by @kashalls

Other than dependency bumps and example jobs, I'm not seeing anything else that you've changed in your fork. Is that accurate?

I can change that. And that is correct, these are the only real changes required for this to work.

buroa · 2024-08-29T16:19:28Z

@brandond Ready for you

buroa closed this Aug 28, 2024

jfroy mentioned this pull request Aug 29, 2024

Short circuit already-applied upgrades siderolabs/talos#9242

Closed

brandond reopened this Aug 29, 2024

buroa force-pushed the buroa/pod-replacement-policy branch from b1e761a to 2b22f62 Compare August 29, 2024 16:18

feat: allow pod replacement policy override

61b4992

buroa force-pushed the buroa/pod-replacement-policy branch from 2b22f62 to 61b4992 Compare August 29, 2024 16:19

buroa changed the title ~~fix: pod replacement policy should be failed~~ feat: allow pod replacement policy override Aug 29, 2024

brandond approved these changes Aug 29, 2024

View reviewed changes

brandond merged commit c900c39 into rancher:master Aug 29, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow pod replacement policy override #328

feat: allow pod replacement policy override #328

buroa commented Aug 27, 2024 •

edited

Loading

buroa commented Aug 27, 2024

brandond commented Aug 27, 2024 •

edited

Loading

buroa commented Aug 28, 2024 •

edited

Loading

brandond commented Aug 28, 2024 •

edited

Loading

buroa commented Aug 28, 2024

brandond commented Aug 28, 2024

buroa commented Aug 28, 2024

brandond commented Aug 28, 2024 •

edited

Loading

kashalls commented Aug 28, 2024 •

edited

Loading

brandond commented Aug 28, 2024 •

edited

Loading

kashalls commented Aug 28, 2024 •

edited

Loading

jfroy commented Aug 29, 2024

brandond commented Aug 29, 2024

buroa commented Aug 29, 2024

buroa commented Aug 29, 2024

feat: allow pod replacement policy override #328

feat: allow pod replacement policy override #328

Conversation

buroa commented Aug 27, 2024 • edited Loading

buroa commented Aug 27, 2024

brandond commented Aug 27, 2024 • edited Loading

buroa commented Aug 28, 2024 • edited Loading

brandond commented Aug 28, 2024 • edited Loading

buroa commented Aug 28, 2024

brandond commented Aug 28, 2024

buroa commented Aug 28, 2024

brandond commented Aug 28, 2024 • edited Loading

kashalls commented Aug 28, 2024 • edited Loading

brandond commented Aug 28, 2024 • edited Loading

kashalls commented Aug 28, 2024 • edited Loading

jfroy commented Aug 29, 2024

brandond commented Aug 29, 2024

buroa commented Aug 29, 2024

buroa commented Aug 29, 2024

buroa commented Aug 27, 2024 •

edited

Loading

brandond commented Aug 27, 2024 •

edited

Loading

buroa commented Aug 28, 2024 •

edited

Loading

brandond commented Aug 28, 2024 •

edited

Loading

brandond commented Aug 28, 2024 •

edited

Loading

kashalls commented Aug 28, 2024 •

edited

Loading

brandond commented Aug 28, 2024 •

edited

Loading

kashalls commented Aug 28, 2024 •

edited

Loading