Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.30] - CNI bin dir changes with K3s version #10929

Closed
brandond opened this issue Sep 23, 2024 · 1 comment
Closed

[Release-1.30] - CNI bin dir changes with K3s version #10929

brandond opened this issue Sep 23, 2024 · 1 comment
Assignees
Milestone

Comments

@brandond
Copy link
Member

Backport fix for CNI bin dir changes with K3s version

@aganesh-suse
Copy link

aganesh-suse commented Oct 23, 2024

Validated on release-1.30 branch with commit 8e1701d

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

Reproduce issue multus_whereabouts_repro.yaml:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: multus
  namespace: kube-system
spec:
  repo: https://rke2-charts.rancher.io
  chart: rke2-multus
  targetNamespace: kube-system
  valuesContent: |-
    manifests:
      configMap:
        true
    config:
      fullnameOverride: multus
      cni_conf:
        confDir: /var/lib/rancher/k3s/agent/etc/cni/net.d
        binDir: /var/lib/rancher/k3s/data/current/bin
        kubeconfig: /var/lib/rancher/k3s/agent/etc/cni/net.d/multus.d/multus.kubeconfig
    rke2-whereabouts:
      fullnameOverride: whereabouts
      enabled: true
      cniConf:
        confDir: /var/lib/rancher/k3s/agent/etc/cni/net.d
        binDir: /var/lib/rancher/k3s/data/current/bin

Validation of fix multus_whereabouts_verify.yaml:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: multus
  namespace: kube-system
spec:
  repo: https://rke2-charts.rancher.io
  chart: rke2-multus
  targetNamespace: kube-system
  valuesContent: |-
    manifests:
      configMap:
        true
    config:
      fullnameOverride: multus
      cni_conf:
        confDir: /var/lib/rancher/k3s/agent/etc/cni/net.d
        binDir: /var/lib/rancher/k3s/data/cni
        kubeconfig: /var/lib/rancher/k3s/agent/etc/cni/net.d/multus.d/multus.kubeconfig
    rke2-whereabouts:
      fullnameOverride: whereabouts
      enabled: true
      cniConf:
        confDir: /var/lib/rancher/k3s/agent/etc/cni/net.d
        binDir: /var/lib/rancher/k3s/data/cni

Testing Steps for issue reproduction:

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_VERSION='v1.30.4+k3s1' sh -s - server
  1. Verify multus+whereabouts pods are coming up and the binary locations as per the applied yamls:
$ kubectl apply -f multus_whereabouts_repro.yaml
  1. Upgrade to latest version.
$ curl -sfL https://get.k3s.io | sudo INSTALL_K3S_VERSION='v1.30.5+k3s1' sh -s - server
  1. Check the multus + whereabouts - if they are working fine.
/var/lib/rancher/k3s/data/current/bin/multus
/var/lib/rancher/k3s/data/current/bin/whereabouts

Testing Steps for validation of fix:

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
For validation: 
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='5ec454f50e7a45113ddae05e68728ec622b0331f' sh -s - server
  1. Verify multus+whereabouts pods are coming up and the binary locations as per the applied yamls:
$ kubectl apply -f multus_whereabouts_verify.yaml
  1. Upgrade to latest commit.
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='8e1701dfa8e6fca31cbb7e3e18e555f665906e1f' sh -s - server
  1. Check the multus + whereabouts - if they are working fine.
/var/lib/rancher/k3s/data/cni/multus
/var/lib/rancher/k3s/data/cni/whereabouts

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.30.5+k3s1 (9b586704)
go version go1.22.6

Pre-upgrade:

$ /var/lib/rancher/k3s/data/current/bin/multus 
meta-plugin that delegates to other CNI plugins
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0

$ /var/lib/rancher/k3s/data/current/bin/whereabouts
whereabouts v0.8.0-8c381170 linux/amd64
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0

Post-upgrade:

$ /var/lib/rancher/k3s/data/current/bin/multus 
bash: line 1: /var/lib/rancher/k3s/data/current/bin/multus: No such file or directory

$ /var/lib/rancher/k3s/data/current/bin/whereabouts
bash: line 1: /var/lib/rancher/k3s/data/current/bin/whereabouts: No such file or directory

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.30.5+k3s-8e1701df (8e1701df)
go version go1.22.6

Pre-upgrade:

$ /var/lib/rancher/k3s/data/cni/multus 
meta-plugin that delegates to other CNI plugins
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0

$ /var/lib/rancher/k3s/data/cni/whereabouts
whereabouts v0.8.0-8c381170 linux/amd64
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0

Post-upgrade:

$ /var/lib/rancher/k3s/data/cni/multus 
meta-plugin that delegates to other CNI plugins
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0

$ /var/lib/rancher/k3s/data/cni/whereabouts
whereabouts v0.8.0-8c381170 linux/amd64
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0, 1.1.0

Other verifications:
config.toml has the cni config and bin directory info:

$ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml | grep cni
[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/var/lib/rancher/k3s/data/cni"
  conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"
$ sudo ls -la /var/lib/rancher/k3s/data/ 
total 28
drwxr-xr-x 5 root root 4096 Oct 22 22:09 .
drwxr-xr-x 5 root root 4096 Oct 22 22:00 ..
-rw------- 1 root root    0 Oct 22 22:00 .lock
drwxr-xr-x 4 root root 4096 Oct 22 22:08 1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c
drwxr-xr-x 2 root root 4096 Oct 22 22:09 cni
lrwxrwxrwx 1 root root   90 Oct 22 22:09 current -> /var/lib/rancher/k3s/data/1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c
drwxr-xr-x 4 root root 4096 Oct 22 22:00 dd47053553b78e6475d25db3116109d4f23c6110062da49c1afb7976f20692ff
lrwxrwxrwx 1 root root   90 Oct 22 22:00 previous -> /var/lib/rancher/k3s/data/dd47053553b78e6475d25db3116109d4f23c6110062da49c1afb7976f20692ff
$ sudo ls -la /var/lib/rancher/k3s/data/cni 
total 159908
drwxr-xr-x 2 root root     4096 Oct 22 22:09 .
drwxr-xr-x 5 root root     4096 Oct 22 22:09 ..
-rwxr-xr-x 1 root root  5044864 Oct 22 22:07 bandwidth
-rwxr-xr-x 1 root root  5480992 Oct 22 22:07 bridge
lrwxrwxrwx 1 root root       98 Oct 22 22:09 cni -> /var/lib/rancher/k3s/data/1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c/bin/cni
-rwxr-xr-x 1 root root 10813312 Oct 22 22:07 dhcp
-rwxr-xr-x 1 root root  5177248 Oct 22 22:07 dummy
-rwxr-xr-x 1 root root  5509312 Oct 22 22:07 firewall
lrwxrwxrwx 1 root root       98 Oct 22 22:09 flannel -> /var/lib/rancher/k3s/data/1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c/bin/cni
-rwxr-xr-x 1 root root  5120000 Oct 22 22:07 host-device
-rwxr-xr-x 1 root root  4614752 Oct 22 22:07 host-local
-rwxr-xr-x 1 root root  5185440 Oct 22 22:07 ipvlan
-rwxr-xr-x 1 root root  2736912 Oct 22 22:07 loopback
-rwxr-xr-x 1 root root  5210048 Oct 22 22:07 macvlan
-rwxr-xr-x 1 root root 39183128 Oct 22 22:07 multus
-rwxr-xr-x 1 root root  5078848 Oct 22 22:07 portmap
-rwxr-xr-x 1 root root  5329472 Oct 22 22:07 ptp
-rwxr-xr-x 1 root root  2893264 Oct 22 22:07 sbr
-rwxr-xr-x 1 root root  2428240 Oct 22 22:07 static
-rwxr-xr-x 1 root root  5232128 Oct 22 22:07 tap
-rwxr-xr-x 1 root root  2798512 Oct 22 22:07 tuning
-rwxr-xr-x 1 root root  5185440 Oct 22 22:07 vlan
-rwxr-xr-x 1 root root  3001104 Oct 22 22:07 vrf
-rwxr-xr-x 1 root root 37671392 Oct 22 22:07 whereabouts
$ sudo ls -lrt /var/lib/rancher/k3s/agent/etc/cni/net.d 
total 16
drwxr-xr-x 2 root root 4096 Oct 22 22:07 whereabouts.d
drwxr-xr-x 2 root root 4096 Oct 22 22:07 multus.d
-rw------- 1 root root  623 Oct 22 22:07 00-multus.conflist
-rw-r--r-- 1 root root  406 Oct 22 22:09 10-flannel.conflist

No fatal errors on all servers/agents:

$ journalctl -xeu k3s | grep 'fatal' 
$ kubectl get nodes
time="2024-10-22T22:11:56Z" level=debug msg="Asset dir /var/lib/rancher/k3s/data/1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c"
time="2024-10-22T22:11:56Z" level=debug msg="Running /var/lib/rancher/k3s/data/1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c/bin/kubectl [kubectl get nodes]"
NAME               STATUS   ROLES                       AGE     VERSION
ip-172-31-11-165   Ready    control-plane,etcd,master   8m16s   v1.30.5+k3s-8e1701df
ip-172-31-13-52    Ready    <none>                      6m56s   v1.30.5+k3s-8e1701df
ip-172-31-13-64    Ready    control-plane,etcd,master   11m     v1.30.5+k3s-8e1701df
ip-172-31-8-85     Ready    control-plane,etcd,master   10m     v1.30.5+k3s-8e1701df

$ kubectl get pods -A
time="2024-10-22T22:11:56Z" level=debug msg="Asset dir /var/lib/rancher/k3s/data/1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c"
time="2024-10-22T22:11:56Z" level=debug msg="Running /var/lib/rancher/k3s/data/1f5aa17d7bf2e23e3d38790a4a252f21b246031ee4831640110d1ddaf8834f1c/bin/kubectl [kubectl get pods -A]"
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-7b98449c4-v7pwj                   1/1     Running     0          10m
kube-system   helm-install-multus-c9s6v                 0/1     Completed   0          4m36s
kube-system   helm-install-traefik-crd-7dp95            0/1     Completed   0          10m
kube-system   helm-install-traefik-grb2c                0/1     Completed   1          10m
kube-system   local-path-provisioner-595dcfc56f-d8b65   1/1     Running     0          10m
kube-system   metrics-server-cdcc87586-884qh            1/1     Running     0          10m
kube-system   multus-c44fd                              1/1     Running     0          4m31s
kube-system   multus-cfjn9                              1/1     Running     0          4m31s
kube-system   multus-tfh7w                              1/1     Running     0          4m31s
kube-system   multus-xtj4p                              1/1     Running     0          4m31s
kube-system   svclb-traefik-bd13b742-2jvtw              2/2     Running     0          10m
kube-system   svclb-traefik-bd13b742-lmrgg              2/2     Running     0          6m55s
kube-system   svclb-traefik-bd13b742-pvsgq              2/2     Running     0          9m59s
kube-system   svclb-traefik-bd13b742-rscsg              2/2     Running     0          8m15s
kube-system   traefik-d7c9c5778-tbf8j                   1/1     Running     0          10m
kube-system   whereabouts-5s5qb                         1/1     Running     0          4m31s
kube-system   whereabouts-7d55l                         1/1     Running     0          4m31s
kube-system   whereabouts-fgj7k                         1/1     Running     0          4m31s
kube-system   whereabouts-mdm74                         1/1     Running     0          4m31s

Verifications for the CNI symlink exists fatal error:

To reproduce: Upgrade from commit 737f594 to commit 5ec454f
To validate: Upgrade from commit 737f594 to commit 8e1701d
Test steps for this: #10869 (comment)
On a repro setup, you will see logs:

$ journalctl -xeu k3s | grep 'fatal' 
Oct 23 16:22:46 ip-172-31-8-85 k3s[86749]: time="2024-10-23T16:22:46Z" level=fatal msg="extracting data: symlink /var/lib/rancher/k3s/data/dd47053553b78e6475d25db3116109d4f23c6110062da49c1afb7976f20692ff/bin/cni /var/lib/rancher/k3s/data/cni/cni: file exists"

On the validated setup:
No fatal errors on all servers/agents:

$ journalctl -xeu k3s | grep 'fatal' 

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

3 participants