control-plane-only nodes cannot start if join server is unreachable #11333

brandond · 2024-11-17T21:03:23Z

Environmental Info:
K3s Version:
n/a

Node(s) CPU architecture, OS, and Version:
n/a

Cluster Configuration:
cluster with dedicated etcd-only and control-plane-only servers

Describe the bug:
Control-plane nodes are uniquely disadvantaged in their startup, and will fail to start if the server address is unreachable. Etcd nodes can always start up, as they can reconcile against their local datastore. Agents run a local load-balancer that caches server addresses across restarts, so they can always start up as long as at least one of the previously-known servers are up.

Steps To Reproduce:

Create a cluster with 3 etcd, 2 control-plane, 1 agent. Use the first etcd node as the --server address for the other nodes.
Stop k3s on the first etcd node.
Note that etcd nodes can be restarted successfully, and agents can be restarted successfully, but control-plane nodes cannot. They fail with:
level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://ETCD-SERVER-1:6443/cacerts\": dial tcp ETCD-SERVER-1:6443: connect: connection refused"

Expected behavior:
All nodes can be restarted as long as at least one previously-known server is available.

Actual behavior:
control-plane nodes fail to start if their join server is unavailable.

Additional context / logs:

Our docs say to use a fixed registration address that is backed by multiple servers, but that guidance is not always followed.

All other configurations are able to start up with at least one previously-discovered server available, this should work for control-plane nodes as well.

The text was updated successfully, but these errors were encountered:

brandond added the kind/enhancement An improvement to existing functionality label Nov 17, 2024

brandond added this to the 2024-12 Release Cycle milestone Nov 17, 2024

brandond self-assigned this Nov 17, 2024

brandond added this to K3s Development Nov 17, 2024

github-project-automation bot moved this to New in K3s Development Nov 17, 2024

brandond changed the title ~~control-plane-only nodes cannot start if registrations address is unreachable~~ control-plane-only nodes cannot start if join server is unreachable Nov 17, 2024

brandond moved this from New to Next Up in K3s Development Nov 17, 2024

snasovich mentioned this issue Dec 11, 2024

[BUG] RKE2 clusters with split role etcd/cp nodes may fail to reconcile after Rancher upgrade rancher/rancher#48387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

control-plane-only nodes cannot start if join server is unreachable #11333

control-plane-only nodes cannot start if join server is unreachable #11333

brandond commented Nov 17, 2024 •

edited

Loading

control-plane-only nodes cannot start if join server is unreachable #11333

control-plane-only nodes cannot start if join server is unreachable #11333

Comments

brandond commented Nov 17, 2024 • edited Loading

brandond commented Nov 17, 2024 •

edited

Loading