Replies: 4 comments 4 replies
-
Have you checked the etcd pod for restarts, or looked at the etcd pod logs? You might also consider looking at the apiserver pod logs? |
Beta Was this translation helpful? Give feedback.
-
Moved etcd database to a dedicated NVME SSD and the issue is resolved. |
Beta Was this translation helpful? Give feedback.
-
@kenho811 how did you move etcd database to a dedicated NVME SSD ? I am facing the same problem. Can u share the solution step by step? |
Beta Was this translation helpful? Give feedback.
-
Hi!, as @kenho811 I was having the same issue. I have a three node cluster in ubuntu22.04 using kuberntes 1.30 and RKE2 also in Proxmox with one master node with etc and two worker nodes. This is a development cluter so IOPS are not as heavy as a prod one. The problem I was having is after installation of prometheus stack that queries a lot of IOPS, the cluster started to fail every time a upgrade or new installation (when etcd was a bit bussy) . I was also using a HDD for whole cluster, after mounting a SSD into the etcd directory problem resolved. Here the steps on how I solved it:
Then restart the pods of the kube-system that I was having the issue when etcd faield: controller, scheduler... |
Beta Was this translation helpful? Give feedback.
-
I installed rke2 on a debian bookworm VM.
I notice that every now and then, my kube-apiserver becomes unhealthy. I checked the kubernetes logs and notice this.
Apparently it is related to
etcd failure
, but I am not too sure how to debug this.Full log attached below.
kube-apiserver-rke2-master-prod.17d2eca167fb7aca.txt
=========
Observation
I observe that the warning event occurs mostly when new pods are being created.
I have an Airflow Scheduler which schedules new Kubernetes Pods every now and then.
Below is a screenshot of the events in my cluster.
When I suddenly schedule a lot of pods, the WARNING event occurs
Beta Was this translation helpful? Give feedback.
All reactions