Joining again after simulated node failure doesn't work (leads to "unhealthy cluster") #3486

lkj4 · 2021-06-20T16:56:03Z

lkj4
Jun 20, 2021

k3s version v1.21.0+k3s1
go version go1.16.2
Linux n2 5.4.0-73-generic # 82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
3 servers

Describe the bug:
I try to simulate a node failure by just un- and reinstalling k3s from the third node

Steps To Reproduce:
Install k3s on 3 nodes with https://gist.github.com/lkj4/5334042a0311784dbdacfad50907f463
(You need to do a terraform apply before which outputs the ips of the created nodes!)

Then, uninstall k3s with k3s' uninstall script from the third node.

Then, install k3s again on the third node. You can just run the above gist again.

Expected behavior:
Node joins again.

Actual behavior:

...
Jun 20 16:39:19 guest sh[9070]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jun 20 16:39:19 guest sh[9074]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jun 20 16:39:20 guest k3s[9087]: time="2021-06-20T16:39:20.047179934Z" level=info msg="Starting k3s v1.21.0+k3s1 (2705431d)"
Jun 20 16:39:20 guest k3s[9087]: time="2021-06-20T16:39:20.100408361Z" level=info msg="Managed etcd cluster not yet initialized"
Jun 20 16:39:20 guest k3s[9087]: time="2021-06-20T16:39:20.118079414Z" level=info msg="Adding https://[THIRD NODE'S IP]:2380 to etcd cluster [n0-0ea66388=https://[FIRST NODE'S IP]:2380 n2-701f6f4f=https://[THIRD NODE'S IP]:2380 n1-876d4c03=https://[SECOND NODE'S IP]:2380]"
Jun 20 16:39:20 guest k3s[9087]: {"level":"warn","ts":"2021-06-20T16:39:20.119Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-16c4d823-fa89-4ae9-b917-95aa31e80efb/[FIRST NODE'S IP]:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: unhealthy cluster"}
Jun 20 16:39:20 guest k3s[9087]: time="2021-06-20T16:39:20.119444365Z" level=fatal msg="starting kubernetes: preparing server: start managed database: joining etcd cluster: etcdserver: unhealthy cluster"
Jun 20 16:39:20 guest systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Jun 20 16:39:20 guest systemd[1]: k3s.service: Failed with result 'exit-code'.
Jun 20 16:39:20 guest systemd[1]: Failed to start Lightweight Kubernetes.

Additional context / logs:
Bonus bug: Once the last node was removed, I'd expect that k3s is rescheduling the pods to the left two nodes but k3s doesn't do anything.

lkj4 · 2021-06-20T17:08:44Z

lkj4
Jun 20, 2021
Author

Ok, it seems that I had to kubectl delete node third_node before installing k3s again to the 3rd node. I guess it wouldn't make sense to just let k3s remove the lost node itself?! Because it might come back...?

Re my bonus bug: It seems that k3s was rescheduling while I was writing the issue before. However, now after I've added the third, k3s keeps the 6 app pods on the first two nodes and does not distribute two back to the third... I'm checking this with kubectl get pods -o wide.

Edit: Still after a while k3s doesn't schedule anything back to the third node:

CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
223m         11%    1776Mi          45%
123m         6%     1732Mi          44%
108m         5%     1329Mi          33%

Edit2: Still no change after ~10 minutes

0 replies

lkj4 · 2021-06-20T18:15:15Z

lkj4
Jun 20, 2021
Author

Ok, re the redistribution, I had to introduce topologySpreadContraints in my pod spec.

Re automatic node deletion after failure and after some time—is there anything in k3s allowing this?

0 replies

lkj4 · 2021-06-20T18:34:18Z

lkj4
Jun 20, 2021
Author

Ok, now with the topologySpreadConstraints set, k3s should reschedule pods when one node goes down but it doesn't. Pods stay on the dead node. And even after I've kubectl delete node third-node, k3s finally reschedules but it took 1-3 minutes.

So, why does not k3s reschedules right away when one node goes down without the need to remove it. Is is expected in k8s that some cloud control manager (CCM) would remove the node and adds a new one automatically? I don't have a CCM, hence I ask.

0 replies

lkj4 · 2021-06-20T18:45:06Z

lkj4
Jun 20, 2021
Author

Now, I've added a new node which successfully joined the existing nodes but the pods are not spreading equally despite having this topologySpreadConstraints applied which worked previously so well.

0 replies

brandond · 2021-06-21T19:18:45Z

brandond
Jun 21, 2021
Collaborator

K3s has no way of knowing if the node is down (maintenance, outage, etc) or gone for good; you are responsible for deleting nodes from the cluster if they will not return. If you are using cloud infrastructure you can use Kubernetes cloud providers to delete nodes from the cluster when they are removed from the cloud infrastructure provider.

If you are using managed etcd, you should be aware of etcd quorum requirements, which may prevent you from joining new servers to the cluster before old ones have been removed.

K3s does not have any special pod scheduling behavior; it behaves the same as any other Kubernetes distribution - both in regards to rescheduling pods away from node that are down, and re-balancing pods when new nodes are added. For the latter, you should node that Kubernetes does not natively re-schedule pods to balance resource utilization, for that you might look at something like the descheduler: https://github.com/kubernetes-sigs/descheduler/blob/master/README.md

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joining again after simulated node failure doesn't work (leads to "unhealthy cluster") #3486

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Joining again after simulated node failure doesn't work (leads to "unhealthy cluster") #3486

lkj4 Jun 20, 2021

Replies: 5 comments

lkj4 Jun 20, 2021 Author

lkj4 Jun 20, 2021 Author

lkj4 Jun 20, 2021 Author

lkj4 Jun 20, 2021 Author

brandond Jun 21, 2021 Collaborator

lkj4
Jun 20, 2021

lkj4
Jun 20, 2021
Author

lkj4
Jun 20, 2021
Author

lkj4
Jun 20, 2021
Author

lkj4
Jun 20, 2021
Author

brandond
Jun 21, 2021
Collaborator