Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution not working after backup/restore #10811

Closed
janekmichalik opened this issue Sep 4, 2024 · 2 comments
Closed

DNS resolution not working after backup/restore #10811

janekmichalik opened this issue Sep 4, 2024 · 2 comments

Comments

@janekmichalik
Copy link

janekmichalik commented Sep 4, 2024

Environmental Info:
K3s Version:

k3s -v
k3s version v1.29.3+k3s1 (8aecc26b)
go version go1.21.8

Node(s) CPU architecture, OS, and Version:

uname -a
Linux 10-55-252-54 5.4.0-176-generic #196-Ubuntu SMP Fri Mar 22 16:46:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 node (master)

Describe the bug:

I have followed the guide from https://docs.k3s.io/datastore/backup-restore#backup-and-restore-with-sqlite.
During restore process, some of my pods are not up, because their init containers are stuck on checking if specified service is up (bad address error).

mkdir backup_platform_state
cd backup_platform_state

systemctl stop k3s

# backup
cp -a /var/lib/rancher/k3s/server/db .
cp -a /var/lib/rancher/k3s/server/token .
systemctl restart k3s

# restore
systemctl stop k3s
rm -rf /var/lib/rancher/k3s/server/db
mv db /var/lib/rancher/k3s/server/
mv token /var/lib/rancher/k3s/server/
systemctl restart k3s

Steps To Reproduce:

  • Installed K3s:

Expected behavior:

No need to restart the coredns pod.

Actual behavior:

DNS resolution is not working, some of my init containers, which call for specified service to check if it's ready is failing because of bad address error. To make it work again I need to restart the coredns pod.

nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...
nc: bad address 'service-1'
wait...
kubectl -n test get svc | grep service
service-1                       ClusterIP   10.43.186.252   <none>        2181/TCP,2888/TCP,3888/TCP

Additional context / logs:

The only logs I have found in coredns pod:

[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Namespace: the server is currently unable to handle the request (get namespaces)
[INFO] plugin/kubernetes: Trace[1776613576]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (04-Sep-2024 09:52:23.840) (total time: 43271ms):
Trace[1776613576]: ---"Objects listed" error:<nil> 43271ms (09:53:07.112)
Trace[1776613576]: [43.271393766s] [43.271393766s] END
@brandond
Copy link
Member

brandond commented Sep 4, 2024

If you're testing by just stopping K3s on an existing node, and replacing the DB file, you should also run k3s-killall.sh to force an immediate restart of the pods - unless you want to wait for them to go unhealthy and get recreated.

@brandond brandond closed this as completed Sep 4, 2024
@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Sep 4, 2024
@janekmichalik
Copy link
Author

@brandond it works, thx. Shame its not mentioned in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants