RKE2 Cluster Nodes Enter NotReady State Under Heavy Workload #6841
Replies: 1 comment
-
I suspect that you're running all of this on top of a single disk? Is this SSD, or something else? Etcd is highly sensitive to disk IO latency. If you're using a single disk - even ssd - for etcd, container images, longhorn, and your workload - and it is all shared storage due to these being VMs - it is highly likely that etcd is crashing or timing out due to insufficient disk throughput or excessive latency. You should put etcd and longhorn on separate physical disks so that they do not compete for IO. You'd want a separate trio of disks per node - one for os/kubernetes, one for etcd, one for longhorn. |
Beta Was this translation helpful? Give feedback.
-
I’m experiencing issues with a 4-node VM cluster running RKE2. The setup includes 3 control plane nodes and 1 worker node, each with 8 virtual CPUs, 32 GB RAM, and 500 GB hard disk. On top of this cluster, I have Longhorn, Calico, and an application running.
When deploying heavy workload pods on the application, some or all of the RKE2 nodes transition to a NotReady state. This results in an inability to communicate with the kube-apiserver and prevents the use of kubectl commands. The system becomes unresponsive, and even after a node recovers, some pods remain in an error state.
Rebooting the affected node temporarily resolves the issue, but this isn’t a practical solution for customers. Ideally, under heavy workload conditions, the system should handle resource constraints by issuing warnings, evicting pods, or pausing operations until resources become available, similar to what happens in vanilla Kubernetes.
Any insights or suggestions on how to improve resource management and handle such scenarios more gracefully would be appreciated.
I have tried following work arounds till now but something seems to be missing
1)Applied RKE2 resource definitions in the rke2 config file
/etc/rancher/rke2/config.yaml with the following configurations
kubelet: extra-args: eviction-hard: memory.available<500Mi,nodefs.available<10% eviction-soft: memory.available<1Gi,nodefs.available<15% eviction-soft-grace-period: memory.available=1m,nodefs.available=1m eviction-max-pod-grace-period: 30 cpu-cfs-quota: "true" cpu-cfs-quota-period: "100ms" api-server: extra-args: memory-requests: "512Mi" memory-limits: "1Gi" cpu-requests: "500m" cpu-limits: "1" controller-manager: extra-args: memory-requests: "256Mi" memory-limits: "512Mi" cpu-requests: "250m" cpu-limits: "500m" scheduler: extra-args: memory-requests: "256Mi" memory-limits: "512Mi" cpu-requests: "250m" cpu-limits: "500m" etcd: extra-args: quota-backend-bytes: "2Gi"
2)Implement Resource Quotas and Limits at the appliaction and longhorn namespace
Namespace Resource Quotas: Use ResourceQuotas to limit the total resource usage in each namespace, ensuring no single namespace can consume all available resources.
Pod Resource Limits: Configure resource.requests and resource.limits in your pod specifications to control how much CPU and memory each pod can use.
What am i missing and what could be done to handle such heavy workflow scenarios?
At the RKE2 level
Beta Was this translation helpful? Give feedback.
All reactions