Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow pod replacement policy override #328

Merged
merged 1 commit into from
Aug 29, 2024

Conversation

buroa
Copy link
Contributor

@buroa buroa commented Aug 27, 2024

This is causing jobs to fail again and again if the pod is terminating (node is draining, etc). This ensures that the job is only recreated if the job/pod actually failed.

https://kubernetes.io/blog/2023/08/21/kubernetes-1-28-jobapi-update/

@buroa
Copy link
Contributor Author

buroa commented Aug 27, 2024

Hey @brandond, I currently maintain a fork of this change that a lot of the community uses. The current one does not work for people running K8s > 1.28. Let me know if you need anything else here.

@brandond
Copy link
Member

brandond commented Aug 27, 2024

This is the first I've heard of a problem with this, and we deploy SUC for upgrades as part of Rancher. We also test it regularly for automated upgrades of standalone RKE2 and K3s clusters.

Can you point me at any issues reporting this problem and describing how to reproduce it?

@buroa
Copy link
Contributor Author

buroa commented Aug 28, 2024

Can you point me at any issues reporting this problem and describing how to reproduce it?

I can have the community weigh in here as I'll put this thread inside our Discord. We are using Talos and wanted to have rolling upgrades much like k3s.

Here is our job spec; talosctl takes care of the drain.

Essentially what talosctl does here is drains the workloads and then applies the new kernel and reboots. Once this takes place, the job will see the pod status as terminating and recreate the pod. So per K8s 1.28, this will cause the job to be recreated again.

So the job will be recreated, which will cause a reboot again and again. It never gets a success and system-upgrade-controller can't continue on with another node.

You can recreate this by deploying a Talos cluster and using the system-upgrade-controller to do a upgrade.

@onedr0p @bjw-s @jfroy @JJGadgets

@brandond
Copy link
Member

brandond commented Aug 28, 2024

Once this takes place, the job will see the pod status as terminating and recreate the pod. So per K8s 1.28, this will cause the job to be recreated again.

This sounds like desired behavior. The job image is supposed to be idempotent, and exit successfully if the upgrade is either completed successfully, or has already been applied. There is no guarantee that the job will only be run once; it may be re-run several times before exiting successfully. You'll see this with the OS upgrade examples, where the nodes are rebooted if necessary. This will cause the pod to terminate early, and run again following the reboot - however on a second pass they will no longer be rebooted, and exit cleanly.
https://github.com/rancher/system-upgrade-controller/blob/master/examples/openSUSE/microos.yaml

Why is your job image exiting with a non-zero exit code following a successful upgrade, or if the upgrade is no longer necessary?

@buroa
Copy link
Contributor Author

buroa commented Aug 28, 2024

Once this takes place, the job will see the pod status as terminating and recreate the pod. So per K8s 1.28, this will cause the job to be recreated again.

This sounds like desired behavior. The job image is supposed to be idempotent, and exit successfully if the upgrade is either completed successfully, or has already been applied. There is no guarantee that the job will only be run once; it may be re-run several times before exiting successfully. You'll see this with the OS upgrade examples, where the nodes are rebooted if necessary. This will cause the pod to terminate early, and run again following the reboot - however on a second pass they will no longer be rebooted, and exit cleanly. master/examples/openSUSE/microos.yaml

Why is your job image exiting with a non-zero exit code following a successful upgrade, or if the upgrade is no longer necessary?

I'm not sure if you are following. The command does exit 0, but talosctl is draining the node so the pod gets evicted during the drain. Since the status was terminating, the job spec will recreate it.

This isn't a hard change. If you are opposed to it, I will maintain my fork.

@brandond
Copy link
Member

I'm not sure if you are following. The command does exit 0.

That's not what you said, you indicated that the job was failing: This is causing jobs to fail again and again. What you're describing is not failure, its just that the job pod is getting recreated - which you seem to be rounding up to a failure state.

talosctl is draining the node so the pod gets evicted during the drain. Since the status was terminating, the job spec will recreate it. So the job will be recreated, which will cause a reboot again and again.

As I said, this is a problem with your job image. It should NOT drain the node if it does not need to upgrade the kernel and reboot.

This isn't a hard change. If you are opposed to it, I will maintain my fork.

I don't understand why you would want to maintain a fork instead of just making your job pod observe the correct idempotent behavior.

@buroa
Copy link
Contributor Author

buroa commented Aug 28, 2024

It seems like you don't want this change and I'm certainly not going to "sell" you the idea when you are clearly against it in the first place.

@buroa buroa closed this Aug 28, 2024
@brandond
Copy link
Member

brandond commented Aug 28, 2024

I'm not opposed to it per se, I'm just trying to understand if there's something this fixes other than improving handling of jobs that are not idempotent. This is a basic requirement that is literally called out in the readme:

https://github.com/rancher/system-upgrade-controller/blob/master/README.md#considerations

Additionally, one should take care when defining upgrades by ensuring that such are idempotent--there be dragons.

Is the whole point of your fork just to no longer require idempotent jobs? What else are you modifying to accommodate that goal?

@kashalls
Copy link

kashalls commented Aug 28, 2024

I'm a little less technical on this subject, but going to leave my comment in support.

A couple (about 62 of us according to kubesearch) of us are relying on buroa's fork as the rancher upstream doesn't work for Talos-based environments. (I would very much like to rely on upstream instead of a fork) Essentially with the talosctl image, the plan would be running while talos is draining the node and upgrading then preforming the upgrade. If we used rancher upstream, the upgrade would continuously be created until it reaches the backoffLimit. This just changes the spec to wait until the pod has actually failed and is not terminating because it has a deletionTimestamp.

His fork adds a lot of automation for the updating and deployment of the fork, other than that the changes he proposes here are the only changes that are needed to bring support to us.

Maybe the right value here to default to is TerminatingOrFailed (default in kubernetes) instead of just Failed (what talos needs) according to the Job spec?
https://github.com/kubernetes/website/blob/dbfbc78c7eb4970fc003a932fcc79eb4a9f7155e/content/en/docs/reference/kubernetes-api/workload-resources/job-v1.md?plain=1#L246 as that was the default value.

@brandond
Copy link
Member

brandond commented Aug 28, 2024

Can someone point me at the fork in question? I am curious if the talos upgrade job would work as-is without forking if the talosctl upgrade image followed best practices around idempotency instead of unconditionally draining and rebooting the node.

@kashalls
Copy link

kashalls commented Aug 28, 2024

Can someone point me at the fork in question? [...]

https://github.com/buroa/system-upgrade-controller/

@jfroy
Copy link

jfroy commented Aug 29, 2024

talosctl upgrade itself sends an RPC to Talos's machined which then unconditionally runs an upgrade sequence. So Talos does violate the assumptions or requirements SUC places on upgrade plans. It may be possible to fix this in Talos and we can take it up with them. It may also be reasonable to consider bending SUC with something like this PR to accommodate systems that don't track previously-applied upgrade requests/sequences/commands.

@brandond
Copy link
Member

@buroa I would be open to this if the default was the current Kubernetes default of TerminatingOrFailed, as proposed by @kashalls

Other than dependency bumps and example jobs, I'm not seeing anything else that you've changed in your fork. Is that accurate?

@brandond brandond reopened this Aug 29, 2024
@buroa
Copy link
Contributor Author

buroa commented Aug 29, 2024

@buroa I would be open to this if the default was the current Kubernetes default of TerminatingOrFailed, as proposed by @kashalls

Other than dependency bumps and example jobs, I'm not seeing anything else that you've changed in your fork. Is that accurate?

I can change that. And that is correct, these are the only real changes required for this to work.

@buroa buroa force-pushed the buroa/pod-replacement-policy branch from b1e761a to 2b22f62 Compare August 29, 2024 16:18
@buroa buroa force-pushed the buroa/pod-replacement-policy branch from 2b22f62 to 61b4992 Compare August 29, 2024 16:19
@buroa buroa changed the title fix: pod replacement policy should be failed feat: allow pod replacement policy override Aug 29, 2024
@buroa
Copy link
Contributor Author

buroa commented Aug 29, 2024

@brandond Ready for you

@brandond brandond merged commit c900c39 into rancher:master Aug 29, 2024
2 checks passed
lumiere-bot bot referenced this pull request in coolguy1771/home-ops Sep 26, 2024
…0.14.0 ) (minor) (#5496)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| docker.io/rancher/system-upgrade-controller | minor | `v0.13.4` ->
`v0.14.0` |
|
[rancher/system-upgrade-controller](https://redirect.github.com/rancher/system-upgrade-controller)
| minor | `v0.13.4` -> `v0.14.0` |

---

### Release Notes

<details>
<summary>rancher/system-upgrade-controller
(rancher/system-upgrade-controller)</summary>

###
[`v0.14.0`](https://redirect.github.com/rancher/system-upgrade-controller/releases/tag/v0.14.0)

[Compare
Source](https://redirect.github.com/rancher/system-upgrade-controller/compare/v0.13.4...v0.14.0)

##### What's Changed

- fix: Drop unneeded reorder surpression by
[@&#8203;SISheogorath](https://redirect.github.com/SISheogorath) in
[https://github.com/rancher/system-upgrade-controller/pull/296](https://redirect.github.com/rancher/system-upgrade-controller/pull/296)
- Update/fix README kustomize command by
[@&#8203;SISheogorath](https://redirect.github.com/SISheogorath) in
[https://github.com/rancher/system-upgrade-controller/pull/297](https://redirect.github.com/rancher/system-upgrade-controller/pull/297)
- Add GHA workflows by
[@&#8203;bfbachmann](https://redirect.github.com/bfbachmann) in
[https://github.com/rancher/system-upgrade-controller/pull/311](https://redirect.github.com/rancher/system-upgrade-controller/pull/311)
- Upgrade outdated golang dependencies by
[@&#8203;harsimranmaan](https://redirect.github.com/harsimranmaan) in
[https://github.com/rancher/system-upgrade-controller/pull/326](https://redirect.github.com/rancher/system-upgrade-controller/pull/326)
- feat: allow pod replacement policy override by
[@&#8203;buroa](https://redirect.github.com/buroa) in
[https://github.com/rancher/system-upgrade-controller/pull/328](https://redirect.github.com/rancher/system-upgrade-controller/pull/328)
- fix: system-upgrade-controller-drainer: add missing delete permission
for pods in clusterrole by
[@&#8203;damdo](https://redirect.github.com/damdo) in
[https://github.com/rancher/system-upgrade-controller/pull/320](https://redirect.github.com/rancher/system-upgrade-controller/pull/320)
- Bump golang.org/x/net from 0.17.0 to 0.23.0 in /pkg/apis by
[@&#8203;dependabot](https://redirect.github.com/dependabot) in
[https://github.com/rancher/system-upgrade-controller/pull/307](https://redirect.github.com/rancher/system-upgrade-controller/pull/307)
- Fix GHA release workflow by
[@&#8203;brandond](https://redirect.github.com/brandond) in
[https://github.com/rancher/system-upgrade-controller/pull/330](https://redirect.github.com/rancher/system-upgrade-controller/pull/330)
- Fix image tag by
[@&#8203;brandond](https://redirect.github.com/brandond) in
[https://github.com/rancher/system-upgrade-controller/pull/331](https://redirect.github.com/rancher/system-upgrade-controller/pull/331)
- Fix artifact permissions by
[@&#8203;brandond](https://redirect.github.com/brandond) in
[https://github.com/rancher/system-upgrade-controller/pull/332](https://redirect.github.com/rancher/system-upgrade-controller/pull/332)

##### New Contributors

- [@&#8203;bfbachmann](https://redirect.github.com/bfbachmann) made
their first contribution in
[https://github.com/rancher/system-upgrade-controller/pull/311](https://redirect.github.com/rancher/system-upgrade-controller/pull/311)
- [@&#8203;harsimranmaan](https://redirect.github.com/harsimranmaan)
made their first contribution in
[https://github.com/rancher/system-upgrade-controller/pull/326](https://redirect.github.com/rancher/system-upgrade-controller/pull/326)
- [@&#8203;damdo](https://redirect.github.com/damdo) made their first
contribution in
[https://github.com/rancher/system-upgrade-controller/pull/320](https://redirect.github.com/rancher/system-upgrade-controller/pull/320)

**Full Changelog**:
rancher/system-upgrade-controller@v0.13.4...v0.14.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://redirect.github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC45NS40IiwidXBkYXRlZEluVmVyIjoiMzguOTUuNCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsicmVub3ZhdGUvY29udGFpbmVyIiwicmVub3ZhdGUvZ2l0aHViLXJlbGVhc2UiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: lumiere-bot[bot] <98047013+lumiere-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants