-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
install script breaks networking when using cilium with systemd >= 249 #7736
Comments
Can you describe how the save/restore is breaking cilium on your node? |
I'm not sure if it breaks cilium itself but the node:
|
We call this out in the docs, see the "Cilium" tab of this section:
Are you saying that this also happens when you re-run the install script when using Cilium? |
I've tested it with those commands and the node networking seems to be working even after executing k3s installer, but there is small problem with this approach. After uncordoning the node cilium stops routing traffic on that node because the interfaces and firewall rules are now missing, that's easily fixable by restarting the cilium pod on the said node but with my workaround this action is not required. Now to my question: is it really necessary to remove all KUBE-* and flannel-* firewall entries after every restart of the k3s service? |
@rbrtbnfgl @manuelbuil would you mind taking a look at this? |
This was added to ensure that stale rules from previous configurations of kubelet, kube-proxy, kube-router, and flannel are cleaned up properly. I'm confused as to why removing these rules would affect the operation of cilium? |
Are you using this https://docs.cilium.io/en/v1.13/network/kube-router/#kube-router ? This could be the issue we are deleting kube-router rules to remove the network policy configuration done inside K3s but following this guide kube-router is used for internal routing. |
If I understand correctly --disable-network-policy disables kube-router built into the k3s and I don't have any standalone deployment of it. This is my helm values file for cilium:
|
I verified with a new setup and the iptables commands are not affecting the cilium rules. |
but now the node is gone (I executed those commands through kvm) I've also tried to execute something like this
and it works but now I have to restart cilium pod so it can restore it's firewall configuration |
But the iptables rules should be the same. Could you check iptables version? Could you try to run K3s with the flag |
Sadly adding --prefer-bundled-bin to INSTALL_K3S_EXEC didn't help |
That's strange. |
yeah I've also stumbled across an bug in cilium which describes my issue. There is a snapshot which is supposed to fix it, I'll try to test it today |
Reading from a comment on the various issues it seems like a bug related to cilium and ubuntu 22.04. I am using ubuntu 20.04 that's why I wasn't able to reproduce. |
I've updated cilium to 1.14.0-snapshot.3 and now it works! Thanks for your support! |
I upgraded cilium to 1.16.1 without kube-proxy, and "iptables-save |grep -v KUBE |iptables-restore" caused network interruption. So I found that in particular mangle table breaks the network, solution is flushing it and rollout restart of cilium so rules are added again. But if you do not have access to the console of the machine you cant do it, because SSH is broken. |
@brandond I can confirm this is also needed on upgrades and not only when running Cilium Version: v1.14.10-cee.1 Shall we update the documentation? |
Why exactly is it needed on upgrades? |
I can't say for sure, but we had the exact same problem as described in this issue when not executing:
My guess is, that it's coming from one of these lines in the installer script:
We have updated our documentation internally. |
We attempt to detect invocation of iptables-save that will not properly round-trip the rules here: Lines 1092 to 1096 in 430a7dc
Can you identify what specifically is being missed so that we can handle it? Note that we don't test K3s with cilium (or any other CNI other than flannel), so it's probably not something that is encountered very often. |
Our iptables look look like this:
root@host1:/root# iptables-legacy -S
Does this help in any way? |
Hey, guys @brandond @adberger! Have you found out anything about this issue? I had the same behavior today, after trying to update a K3s configuration to disable the default storage, after restarting the K3s service I lost access to the node. O.S.: 5.10.0-33-amd64 #1 SMP Debian 5.10.226-1 (2024-10-03) x86_64 Do you know how I can solve this? I'm unsure about the stability of the cluster. Thank you so much! |
Nope.. if someone wants to figure out what's being dropped as described at #7736 (comment) we could try to modify the script to detect it. Also, are you sure this is the same issue? Did you for some reason re-run the install script just to change the configuration? |
Thanks @brandond for the quick response! I may have made a mistake, I ran this command to install the cluster: curl -sfL https://get.k3s.io | sh -s - server \
--flannel-backend=none \
--disable-network-policy \
--disable-kube-proxy \
--disable traefik \
--disable servicelb \
--cluster-init And the one below was to update it: curl -sfL https://get.k3s.io | sh -s - server \
--flannel-backend=none \
--disable-network-policy \
--disable-kube-proxy \
--disable traefik \
--disable servicelb \
--disable local-storage \
--cluster-init I just added |
No, that'd re-run the install script. You don't really need to do that just to change the config though. You can just |
Great, I'll test it, thanks a lot! |
Our solution was to remove the iptables binary from the host, since cilium manages the iptables via the Pod and doesn't need the binary on the host. When iptables is not found when running the k3s script, networking will not break. |
Thanks, @adberger! I will try it. |
Environmental Info:
K3s Version: v1.27.1+k3s1
Node(s) CPU architecture, OS, and Version: Ubuntu 22.04.2 LTS
Linux node 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
3 control planes
extra args "--flannel-backend=none --disable-network-policy --disable servicelb --disable traefik --disable local-storage"
CNI: cilium v1.13.3 with native routing (https://docs.cilium.io/en/v1.13/network/concepts/routing/#native-routing)
Describe the bug:
When flannel cni is disabled "service_enable_and_start()" function still tries to save and restore iptables which may break node networking for some CNI providers. Please add an argument to disable this functionality.
Steps To Reproduce:
Expected behavior:
Actual behavior:
Additional context / logs:
Workaround:
set INSTALL_K3S_SKIP_START to true and start the k3s service manually
The text was updated successfully, but these errors were encountered: