Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k3s service does not start #11234

Closed
ridervka opened this issue Nov 5, 2024 · 4 comments
Closed

k3s service does not start #11234

ridervka opened this issue Nov 5, 2024 · 4 comments

Comments

@ridervka
Copy link

ridervka commented Nov 5, 2024

Environmental Info:
K3s Version:
v1.28.6+k3s2

Node(s) CPU architecture, OS, and Version:
3 nodes with similar characteristics
Linux ds89290 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
3 Nodes in the Roles control-plane,master

Describe the bug:
After running the command k3s crictl rmi --prune the k3s service does not start.

Steps To Reproduce:
Run the command k3s crictl rmi --prune and restart the computer.

Additional context / logs:

On the problem node I see

The systemctl status k3s command shows

`k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
Active: activating (start) since Tue 2024-11-05 12:37:28 UTC; 25min ago
Docs: https://k3s.io
Process: 726 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/S>
Process: 752 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 768 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 777 (k3s-server)
Tasks: 12
Memory: 164.8M
CPU: 972ms
CGroup: /system.slice/k3s.service
└─777 "/usr/local/bin/k3s server" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "

Nov 05 12:43:36 ds89290 k3s[777]: time="2024-11-05T12:43:36Z" level=error msg="failed to ping connection: driver: bad connection"
Nov 05 12:45:38 ds89290 k3s[777]: time="2024-11-05T12:45:38Z" level=error msg="failed to ping connection: driver: bad connection"
Nov 05 12:47:39 ds89290 k3s[777]: time="2024-11-05T12:47:39Z" level=error msg="failed to ping connection: driver: bad connection"`

On the other two nodes I see

k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2024-11-04 12:47:15 UTC; 24h ago
Docs: https://k3s.io
Main PID: 3715692 (k3s-server)
Tasks: 514
Memory: 1.6G
CPU: 4min 51.908s
CGroup: /system.slice/k3s.service
├─ 3513 /var/lib/rancher/k3s/data/13f9723ffde84ba41d08658d407a523bcf32698f179c9ab30cc0534e1e5d2c1a/bin/containerd-shim-runc-v2>
├─ 3703 /var/lib/rancher/k3s/data/13f9723ffde84ba41d08658d407a523bcf32698f179c9ab30cc0534e1e5d2c1a/bin/containerd-shim-runc-v2>
├─ 3782 /var/lib/rancher/k3s/data/13f9723ffde84ba41d08658d407a523bcf32698f179c9ab30cc0534e1e5d2c1a/bin/containerd-shim-runc-v2>
├─ 3793 /var/lib/rancher/k3s/data/13f9723ffde84ba41d08658d407a523bcf32698f179c9ab30cc0534e1e5d2c1a/bin/containerd-shim-runc-v2>
├─ 3850 /var/lib/rancher/k3s/data/13f9723ffde84ba41d08658d407a523bcf32698f179c9ab30cc0534e1e5d2c1a/bin/containerd-shim-runc-v2>
├─ 3929 /var/lib/rancher/k3s/data/13f9723ffde84ba41d08658d407a523bcf32698f179c9ab30cc0534e1e5d2c1a/bin/containerd-shim-runc-v2>
...

@brandond
Copy link
Member

brandond commented Nov 5, 2024

What are you using as the database for this cluster? You've provided very little in terms of error messages, but what you're showing here suggests that node you restarted is unable to connect to the database. This has nothing to do with the prune command you ran prior to restarting. Check the nodes connection to the database.

@ridervka
Copy link
Author

ridervka commented Nov 5, 2024

Thanks for your reply! Yes, there is indeed something wrong with my Postgres database, my request is being rejected by timeout.
k3s=> \l FATAL: query_wait_timeout server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded.

I have Postgres+patroni installed with haproxy. It turns out that I have 3 nodes in the k3s cluster, postgres with patroni is installed on each node. Here is the configuration of my k3s server.
k3s server --flannel-backend=none --disable-network-policy --disable=traffic --datastore-endpoint=postgres://k3s:*******@1.1.1.1:5000/k3s --token=K10f69d40251***************333333333aesfew4c25fadb4d::server:edef9afdasd3f32f32fgead751799a4 --server=https://127.0.0.1:6445
@brandond

@brandond
Copy link
Member

brandond commented Nov 5, 2024

I can't really help you with that. You'll need to figure out why K3s can't connect to the database, wherever and however it is hosted. Have you checked to see if postgres and patroni are working properly following the restart?

@brandond brandond closed this as completed Nov 5, 2024
@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Nov 5, 2024
@ridervka
Copy link
Author

ridervka commented Nov 5, 2024

I rebooted only the problematic node and, as you know, it did not start after that. The other two nodes are still afraid to reboot, the k3s service is running there, after a reboot, the same situation may occur as with the problematic node. Thank you for your help, I wish you good health! @brandond
PS My colleague will see soon if he can solve the problem.
P.P.S. if I manage to solve the problem, I will answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants