Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-proxy doesn't remove stale CNI-HOSTPORT-DNAT rule after Kubernetes upgrade to 1.26 #3440

Open
raelix opened this issue Nov 17, 2023 · 22 comments
Assignees

Comments

@raelix
Copy link

raelix commented Nov 17, 2023

RKE version:

v1.4.6

Docker version: (docker version,docker info preferred)
20.10.24

Operating system and kernel: (cat /etc/os-release, uname -r preferred)
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Openstack

cluster.yml file:

nodes:
    - address: 10.94.1.8
      internal_address: 172.16.10.53
      ssh_key_path: /home/user/.ssh/id_rsa
      user: user
      role:
        - controlplane
        - etcd
        - worker
ignore_docker_version: true
enable_cri_dockerd: true
cluster_name: mycluster
kubernetes_version: v1.26.4-rancher2-1
network:
  plugin: flannel
ingress:
  provider: nginx

Steps to Reproduce:
Source versions -> rke: v1.4.3 - kubernetes_version: v1.23.10-rancher1-1
Dest versions -> rke: v.1.4.6 - kubernetes_version: v1.26.4-rancher2-1

Trying to upgrade Kubernetes with RKE from v1.23.10 to v1.26.4 I was not able anymore to reach my ingresses through the nginx-ingress-controller which listens on hostPort 80 and 443.

I investigated further and I found that the CNI-HOSTPORT-DNAT Chain had still the old entry.

Before the upgrade:

[root@rancher user]# iptables -t nat -L CNI-HOSTPORT-DNAT  --line-numbers 
Chain CNI-HOSTPORT-DNAT (2 references)
num  target     prot opt source               destination         
1    CNI-DN-ff3905f57536228de6b29  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "e96c642e169acf789be84e8fbcd0e5c3da1a53c8e8a459227c06bdf423deb482" */ multiport dports http,https

[root@rancher user]# iptables -t nat -L CNI-DN-ff3905f57536228de6b29  --line-numbers 
Chain CNI-DN-ff3905f57536228de6b29 (1 references)
num  target     prot opt source               destination         
1    CNI-HOSTPORT-SETMARK  tcp  --  rancher.internal.com/24  anywhere             tcp dpt:http
2    CNI-HOSTPORT-SETMARK  tcp  --  localhost            anywhere             tcp dpt:http
3    DNAT       tcp  --  anywhere             anywhere             tcp dpt:http to:10.42.0.7:80
4    CNI-HOSTPORT-SETMARK  tcp  --  rancher.internal.com/24  anywhere             tcp dpt:https
5    CNI-HOSTPORT-SETMARK  tcp  --  localhost            anywhere             tcp dpt:https
6    DNAT       tcp  --  anywhere             anywhere             tcp dpt:https to:10.42.0.7:443

This looks good.

After the upgrade:

[root@rancher user]# iptables -t nat -L CNI-HOSTPORT-DNAT  --line-numbers 
Chain CNI-HOSTPORT-DNAT (2 references)
num  target     prot opt source               destination         
1    CNI-DN-ff3905f57536228de6b29  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "e96c642e169acf789be84e8fbcd0e5c3da1a53c8e8a459227c06bdf423deb482" */ multiport dports http,https
2    CNI-DN-4c3eba344b3e2fffe3698  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "aa3202e02a9fefbc97400df0685d864bc3894a580d4b2069542621371e1cfde8" */ multiport dports http,https

The second entry is the right one which points to the new pod IP but the first one should not be there.
Looks like kube-proxy doesn't delete the old entry making it impossible to access the ingresses.

As workaround I had to reboot the server or delete manually the entry:
iptables -t nat -D CNI-HOSTPORT-DNAT 1

Results:
After the upgrade can't talk to the ingress controller listening on hostPort.

@manuelbuil manuelbuil self-assigned this Nov 28, 2023
@manuelbuil
Copy link
Contributor

Thanks for reporting this issue. I'll try to reproduce this. Note that:

  • kube-proxy has nothing to do with CNI-HOSTPORT-DNAT chain. That chain is part of the cni plugin: Port-mapping plugin, which gets chained after the default CNI plugin (in your case flannel). You can check this in /etc/cni/net.d/
  • What should actually happen is that when the old nginx pod is removed, kubelet must call the CNI plugin with that information: "please remove the networking stuff belonging to this pod". Apart from flannel, port-mapping plugin should get also called and hence remove the rule under CNI-HOSTPORT-DNAT from the old pod.

Can you confirm that after doing the upgrade the old nginx pod is removed?

@raelix
Copy link
Author

raelix commented Nov 28, 2023

thanks @manuelbuil , yes it is, the old pod is removed and a new one is started but the entry of the old one is still there.
By deleting again the pod or by removing manually the rule fixes the issue which is a bit weird.

@manuelbuil
Copy link
Contributor

I am not able to reproduce the issue. Could it be that you did a rke remove at some point? I noticed that rke remove does not remove the chains of CNI-HOSTPORT-DNAT

@raelix
Copy link
Author

raelix commented Nov 28, 2023

@manuelbuil did you use RHEL 8? I was not able to reproduce on Oracle 9 nor on Ubuntu.
I didn't run rke remove, just rke up

@manuelbuil
Copy link
Contributor

Let me try again with something that might the reason why.

BTW, the upgrade you are trying is not really supported because you are jumping several RKE versions and several minors: https://www.suse.com/support/kb/doc/?id=000020061. We must update the docs to avoid these big jumps

@raelix
Copy link
Author

raelix commented Nov 28, 2023

@manuelbuil, good point I also tried one by one but it happened as well (src version 1.23).
What are you going to try? I can also try that if you want.

@raelix
Copy link
Author

raelix commented Nov 28, 2023

Here it is the pod removal and re-creation:

kuberuntime_container.go:709] "Killing container with a grace period" pod="ingress-nginx/nginx-ingress-controller-rgm7x" podUID=6b31ab54-f37e-47cb-8b68-fdabef7ac43c containerName="controller" containerID="docker://df269fffce7b55884298f37253df61942a167bbca0686f53186f93283f7ec810" gracePeriod=3000
kubelet.go:2215] "SyncLoop DELETE" source="api" pods="[ingress-nginx/nginx-ingress-controller-rgm7x]"
kubelet.go:2209] "SyncLoop REMOVE" source="api" pods="[ingress-nginx/nginx-ingress-controller-rgm7x]"
kubelet.go:2199] "SyncLoop ADD" source="api" pods="[ingress-nginx/nginx-ingress-controller-rghd5]"
kubelet.go:2303] "SyncLoop (probe)" probe="readiness" status="ready" pod="ingress-nginx/nginx-ingress-controller-rghd5"

@manuelbuil
Copy link
Contributor

Unfortunately, I don't have access right now to a RHEL8 machine but I can't reproduce it on RHEL9. I tried changing a couple of things but I still see the pod being removed correctly. Could you try with a newer RKE version and see if you still get the error?

@raelix
Copy link
Author

raelix commented Nov 28, 2023

Thanks @manuelbuil. On RHEL 9 I was not able to reproduce it as well.

I just tested rke 1.4.11 and it doesn't work, I tested then with Calico and the issue is not present I suspect that it could be related to flannel.
Do you have any suggestion in the meantime that I can try? Thanks in advance!

@manuelbuil
Copy link
Contributor

I have just tried on RHEL8 and still can't reproduce it. I think it must be something in your env. Could you check if the old IP is still in /var/lib/cni/networks/cbr0? If that is the case, could check the content of the file with that IP? Perhaps something weird happened and flannel was unable to really remove it. In that case, it is even possible that the network namespace is still there

@manuelbuil
Copy link
Contributor

Are you able to reproduce it constantly or it just happened once?

@manuelbuil
Copy link
Contributor

I kind of remember an issue with very old flannel versions in which sometimes there was a race condition which made flannel "forget" about removing the pods correctly. So the IPs stayed in that directory and the node was exhausting all IPs and was unable to create new pods. Maybe you are hitting that since 1.23 must be using an ancient flannel version

@raelix
Copy link
Author

raelix commented Nov 28, 2023

Thanks @manuelbuil , I just checked but the file is not there. This is the result:

After fresh install

[root@redhat-8 cloud-user]# cat /var/lib/cni/networks/cbr0/10.42.0.7 
5da22186031931381f9961358dedf1ad905aab6bfcde3b4c2cf21890f556c2f7

And just one entry in the chain:

iptables -t nat -L CNI-HOSTPORT-DNAT
Chain CNI-HOSTPORT-DNAT (2 references)
target     prot opt source               destination         
CNI-DN-1dfb697df81c370bbec19  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "5da22186031931381f9961358dedf1ad905aab6bfcde3b4c2cf21890f556c2f7" */ multiport dports http,https

After the upgrade

The IP is not anymore there:

[root@redhat-8 cloud-user]# ls -ltria /var/lib/cni/networks/cbr0/*
193123737 -rwxr-x---. 1 root root  0 28 nov 10.09 /var/lib/cni/networks/cbr0/lock
193123763 -rw-r--r--. 1 root root 70 28 nov 10.12 /var/lib/cni/networks/cbr0/10.42.0.8
193123764 -rw-r--r--. 1 root root 70 28 nov 10.12 /var/lib/cni/networks/cbr0/10.42.0.9
193123741 -rw-r--r--. 1 root root 70 28 nov 10.13 /var/lib/cni/networks/cbr0/10.42.0.10
193123739 -rw-r--r--. 1 root root 10 28 nov 10.13 /var/lib/cni/networks/cbr0/last_reserved_ip.0
193123727 -rw-r--r--. 1 root root 70 28 nov 10.13 /var/lib/cni/networks/cbr0/10.42.0.13

The new one is:

[root@redhat-8 cloud-user]# cat /var/lib/cni/networks/cbr0/10.42.0.13
88c0dbed427446618212eefc226917f672178717f1d7b8fa4b290e882d41746e

But now there are two entries in iptables which is causing the issue:

[cloud-user@redhat-8 ~]$ sudo iptables -t nat -L CNI-HOSTPORT-DNAT
Chain CNI-HOSTPORT-DNAT (2 references)
target     prot opt source               destination         
CNI-DN-1dfb697df81c370bbec19  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "5da22186031931381f9961358dedf1ad905aab6bfcde3b4c2cf21890f556c2f7" */ multiport dports http,https
CNI-DN-7a4104b730b66c4894a01  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "88c0dbed427446618212eefc226917f672178717f1d7b8fa4b290e882d41746e" */ multiport dports http,https

Thanks for your help! It's anything else do you suggest me to check? I also tried to change the flannel version from the very beginning by using:

system_images:
  flannel: rancher/mirrored-flannel-flannel:v0.23.0

But didn't help

@raelix
Copy link
Author

raelix commented Nov 28, 2023

Are you able to reproduce it constantly or it just happened once?

I'm able to reproduce it constantly unfortunately. Consider that I spawn a VM every time so the machine is brand new. Moreover even if I delete the pod only the last entry is updated, the first one remain there unless I reboot or I remove it manually.

@manuelbuil
Copy link
Contributor

Strange, for some reason your portmap cni plugin is not doing his job correctly when deleting the pod in your RHEL 8.6 OS (BTW, I tried with 8.7 and it worked for me, could you perhaps upgrade?). The portmap code outputs some error logs https://github.com/containernetworking/plugins/blob/main/plugins/meta/portmap/portmap.go, could you check the logs of kubelet to see if there is a related error there?

@raelix
Copy link
Author

raelix commented Nov 29, 2023

Hi @manuelbuil I checked kubelet logs (v=6), no error logs on portmap...I also moved to Oracle 8.7 just to be sure and it's happening as well...Could it be related to any sysctl parameter? Or is there a way to get all the logs from portmap? Thanks!

@manuelbuil
Copy link
Contributor

Hi @manuelbuil I checked kubelet logs (v=6), no error logs on portmap...I also moved to Oracle 8.7 just to be sure and it's happening as well...Could it be related to any sysctl parameter? Or is there a way to get all the logs from portmap? Thanks!

Could you check the version: /opt/cni/bin/portmap version? The code is pretty simple: https://github.com/containernetworking/plugins/blob/main/plugins/meta/portmap/portmap.go#L375-L404. I thought you would see the log could not teardown ipv4 dnat in kubelet logs

@raelix
Copy link
Author

raelix commented Nov 29, 2023

Yup, I already searched for that string but no results on the new spawned kubelet container.

About versions

Before:

CNI portmap plugin v0.8.6

After:

CNI portmap plugin v1.2.0
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0

@manuelbuil
Copy link
Contributor

Does it happen too when you try with newer RKE versions? Just wondering if it only happens with that specific flannel version

@raelix
Copy link
Author

raelix commented Nov 29, 2023

I added some debug messages to portmap and I saw some interesting things:

On fresh install

2023/11/29 16:35:35 portmap : Called CMD ADD with {d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf /proc/10231/ns/net eth0 IgnoreUnknown=1;K8S_POD_NAMESPACE=ingress-nginx;K8S_POD_NAME=nginx-ingress-controller-zb7c8;K8S_POD_INFRA_CONTAINER_ID=d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf /opt/cni/bin
# Pod:
ingress-nginx   nginx-ingress-controller-zb7c8
# Rule:
CNI-DN-0bc0ea5e8afa071cc4c15  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf" */ multiport dports http,https

On upgrade (new Kubelet container)

2023/11/29 16:38:27 portmap : Called CMD DEL with {934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497  eth0 IgnoreUnknown=1;K8S_POD_NAMESPACE=ingress-nginx;K8S_POD_NAME=nginx-ingress-controller-2hm9c;K8S_POD_INFRA_CONTAINER_ID=934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497 /opt/cni/bin

But this is not a known container, in fact the new one is:

# Pod
ingress-nginx   nginx-ingress-controller-cbpkn 
# Rule
CNI-DN-f52a6174ca7e4eae96364  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "62202b3f51c119f2a2799d641630e47da6d0ab344b1bc9752db048e294830c73" */ multiport dports http,https

So looks like the delete is called on a container ID that I do not see at all (running k get pods in background)

@manuelbuil
Copy link
Contributor

I added some debug messages to portmap and I saw some interesting things:

On fresh install

2023/11/29 16:35:35 portmap : Called CMD ADD with {d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf /proc/10231/ns/net eth0 IgnoreUnknown=1;K8S_POD_NAMESPACE=ingress-nginx;K8S_POD_NAME=nginx-ingress-controller-zb7c8;K8S_POD_INFRA_CONTAINER_ID=d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf /opt/cni/bin
# Pod:
ingress-nginx   nginx-ingress-controller-zb7c8
# Rule:
CNI-DN-0bc0ea5e8afa071cc4c15  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf" */ multiport dports http,https

On upgrade (new Kubelet container)

2023/11/29 16:38:27 portmap : Called CMD DEL with {934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497  eth0 IgnoreUnknown=1;K8S_POD_NAMESPACE=ingress-nginx;K8S_POD_NAME=nginx-ingress-controller-2hm9c;K8S_POD_INFRA_CONTAINER_ID=934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497 /opt/cni/bin

But this is not a known container, in fact the new one is:

# Pod
ingress-nginx   nginx-ingress-controller-cbpkn 
# Rule
CNI-DN-f52a6174ca7e4eae96364  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "62202b3f51c119f2a2799d641630e47da6d0ab344b1bc9752db048e294830c73" */ multiport dports http,https

So looks like the delete is called on a container ID that I do not see at all (running k get pods in background)

Thanks for the logs. However, flannel got the correct ID because the IP is gone. Strange. And if you query the docker pods, do you see that strange 934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497?

@raelix
Copy link
Author

raelix commented Nov 30, 2023

I added some debug messages to portmap and I saw some interesting things:

On fresh install

2023/11/29 16:35:35 portmap : Called CMD ADD with {d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf /proc/10231/ns/net eth0 IgnoreUnknown=1;K8S_POD_NAMESPACE=ingress-nginx;K8S_POD_NAME=nginx-ingress-controller-zb7c8;K8S_POD_INFRA_CONTAINER_ID=d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf /opt/cni/bin
# Pod:
ingress-nginx   nginx-ingress-controller-zb7c8
# Rule:
CNI-DN-0bc0ea5e8afa071cc4c15  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "d207c64b815e02784c9079227f113ca4f0bf5022fee4ac20eaae0baa3963b2bf" */ multiport dports http,https

On upgrade (new Kubelet container)

2023/11/29 16:38:27 portmap : Called CMD DEL with {934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497  eth0 IgnoreUnknown=1;K8S_POD_NAMESPACE=ingress-nginx;K8S_POD_NAME=nginx-ingress-controller-2hm9c;K8S_POD_INFRA_CONTAINER_ID=934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497 /opt/cni/bin

But this is not a known container, in fact the new one is:

# Pod
ingress-nginx   nginx-ingress-controller-cbpkn 
# Rule
CNI-DN-f52a6174ca7e4eae96364  tcp  --  anywhere             anywhere             /* dnat name: "cbr0" id: "62202b3f51c119f2a2799d641630e47da6d0ab344b1bc9752db048e294830c73" */ multiport dports http,https

So looks like the delete is called on a container ID that I do not see at all (running k get pods in background)

Thanks for the logs. However, flannel got the correct ID because the IP is gone. Strange. And if you query the docker pods, do you see that strange 934a23b5ca4d8e0c2ff1d3572f03ffdb06dfcd2acbf90d752dab1f1fb5881497?

Not at all, I tried a couple of times and the ID it reports doesn't exist on docker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants