You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Node drain blocked with pods stuck in terminating state during rke2 rolling updates
During rolling upgrades, control-plane node machine is removed from etcd cluster as soon as the machine is being rolled out here :
// If etcd leadership is on machine that is about to be deleted, move it to the newest member available.
etcdLeaderCandidate := controlPlane.Machines.Newest()
if err := r.workloadCluster.ForwardEtcdLeadership(ctx, machineToDelete, etcdLeaderCandidate); err != nil {
logger.Error(err, "Failed to move leadership to candidate machine", "candidate", etcdLeaderCandidate.Name)
return ctrl.Result{}, err
}
if err := r.workloadCluster.RemoveEtcdMemberForMachine(ctx, machineToDelete); err != nil {
logger.Error(err, "Failed to remove etcd member for machine")
return ctrl.Result{}, err
}
logger = logger.WithValues("machine", machineToDelete)
if err := r.Client.Delete(ctx, machineToDelete); err != nil && !apierrors.IsNotFound(err) {
logger.Error(err, "Failed to delete control plane machine")
r.recorder.Eventf(rcp, corev1.EventTypeWarning, "FailedScaleDown",
"Failed to delete control plane Machine %s for cluster %s/%s control plane: %v", machineToDelete.Name, cluster.Namespace, cluster.Name, err)
return ctrl.Result{}, err
}
The issue is that in rke2 deployments, kubelet is configured to use local api server (127.0.0.1:443), which in turn relies on local etcd pod. But as this node is removed from etcd cluster, kubelet won't be able to reach the API any more, and it will fail to properly drain the node as all pods will remain stuck in Terminating state from kubernetes perspective.
We should probably try to avoid removing etcd member so early during rolling upgrades, we could instead rely on periodic reconcileEtcdMembers
that ensures the number of etcd members is in sync with the number of machines/nodes, this way etcd members will be removed only after the node has been properly drained and removed from cluster by capi controller.
// Ensures the number of etcd members is in sync with the number of machines/nodes.
// NOTE: This is usually required after a machine deletion.
if err := r.reconcileEtcdMembers(ctx, controlPlane); err != nil {
return ctrl.Result{}, err
}
It seems to us that the longer it takes to drain, the more likely the occurrences are failures due to this behavior (which seem to us as being a bug) are a frequent cause of CI failures in Sylva project pipelines, so this problem is quite "hot" for us
If we look at the node, we wan see that kubelet has stopped to report status:
SURE-9012
See issue for more context information
Issue description:
Node drain blocked with pods stuck in terminating state during rke2 rolling updates
During rolling upgrades, control-plane node machine is removed from etcd cluster as soon as the machine is being rolled out here :
=> https://github.com/rancher/cluster-api-provider-rke2/blob/v0.6.0/controlplane/internal/controllers/scale.go#L154-L166
The issue is that in rke2 deployments, kubelet is configured to use local api server (127.0.0.1:443), which in turn relies on local etcd pod. But as this node is removed from etcd cluster, kubelet won't be able to reach the API any more, and it will fail to properly drain the node as all pods will remain stuck in Terminating state from kubernetes perspective.
We should probably try to avoid removing etcd member so early during rolling upgrades, we could instead rely on periodic reconcileEtcdMembers
that ensures the number of etcd members is in sync with the number of machines/nodes, this way etcd members will be removed only after the node has been properly drained and removed from cluster by capi controller.
==> https://github.com/rancher/cluster-api-provider-rke2/blob/v0.6.0/controlplane/internal/controllers/rke2controlplane_controller.go#L511-L515
It seems to us that the longer it takes to drain, the more likely the occurrences are failures due to this behavior (which seem to us as being a bug) are a frequent cause of CI failures in Sylva project pipelines, so this problem is quite "hot" for us
If we look at the node, we wan see that kubelet has stopped to report status:
While looking at kubelet logs, we see that it starts failing to reach the API at 21:51:41 :
It can be explained by the fact that api-server is in crashLoop, failing to reach etcd:
Which is unavailable as local etcd node has been removed from cluster by capi controller (as described in #1420 (closed)).
=> https://gitlab.com/sylva-projects/sylva-core/-/issues/1420
On kubeadm, we don't have the same issue as kubelet is using the VIP to reach the api-server.
This issue was maybe hidden by the drainTimeout that was previously set on nodes: sylva-projects/sylva-elements/helm-charts/sylva-capi-cluster!421 (merged)
=> https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-capi-cluster/-/merge_requests/421
Additional notes:
#431
https://gitlab.com/sylva-projects/sylva-core/-/issues/1595
The text was updated successfully, but these errors were encountered: