DNS resolution not working after backup/restore #10811

janekmichalik · 2024-09-04T07:29:24Z

Environmental Info:
K3s Version:

k3s -v
k3s version v1.29.3+k3s1 (8aecc26b)
go version go1.21.8

Node(s) CPU architecture, OS, and Version:

uname -a
Linux 10-55-252-54 5.4.0-176-generic #196-Ubuntu SMP Fri Mar 22 16:46:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 node (master)

Describe the bug:

I have followed the guide from https://docs.k3s.io/datastore/backup-restore#backup-and-restore-with-sqlite.
During restore process, some of my pods are not up, because their init containers are stuck on checking if specified service is up (bad address error).

mkdir backup_platform_state
cd backup_platform_state

systemctl stop k3s

# backup
cp -a /var/lib/rancher/k3s/server/db .
cp -a /var/lib/rancher/k3s/server/token .
systemctl restart k3s

# restore
systemctl stop k3s
rm -rf /var/lib/rancher/k3s/server/db
mv db /var/lib/rancher/k3s/server/
mv token /var/lib/rancher/k3s/server/
systemctl restart k3s

Steps To Reproduce:

Installed K3s:

Expected behavior:

No need to restart the coredns pod.

Actual behavior:

DNS resolution is not working, some of my init containers, which call for specified service to check if it's ready is failing because of bad address error. To make it work again I need to restart the coredns pod.

nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...

kubectl -n test get svc | grep service
service-1                       ClusterIP   10.43.186.252   <none>        2181/TCP,2888/TCP,3888/TCP

Additional context / logs:

The only logs I have found in coredns pod:

[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Namespace: the server is currently unable to handle the request (get namespaces)
[INFO] plugin/kubernetes: Trace[1776613576]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (04-Sep-2024 09:52:23.840) (total time: 43271ms):
Trace[1776613576]: ---"Objects listed" error:<nil> 43271ms (09:53:07.112)
Trace[1776613576]: [43.271393766s] [43.271393766s] END

The text was updated successfully, but these errors were encountered:

brandond · 2024-09-04T18:05:45Z

If you're testing by just stopping K3s on an existing node, and replacing the DB file, you should also run k3s-killall.sh to force an immediate restart of the pods - unless you want to wait for them to go unhealthy and get recreated.

janekmichalik · 2024-09-06T12:28:17Z

@brandond it works, thx. Shame its not mentioned in the docs.

github-project-automation bot added this to K3s Development Sep 4, 2024

github-project-automation bot moved this to New in K3s Development Sep 4, 2024

brandond closed this as completed Sep 4, 2024

github-project-automation bot moved this from New to Done Issue in K3s Development Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS resolution not working after backup/restore #10811

DNS resolution not working after backup/restore #10811

janekmichalik commented Sep 4, 2024 •

edited

Loading

brandond commented Sep 4, 2024

janekmichalik commented Sep 6, 2024

DNS resolution not working after backup/restore #10811

DNS resolution not working after backup/restore #10811

Comments

janekmichalik commented Sep 4, 2024 • edited Loading

brandond commented Sep 4, 2024

janekmichalik commented Sep 6, 2024

janekmichalik commented Sep 4, 2024 •

edited

Loading