You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cluster Configuration:
cluster with dedicated etcd-only and control-plane-only servers
Describe the bug:
Control-plane nodes are uniquely disadvantaged in their startup, and will fail to start if the server address is unreachable. Etcd nodes can always start up, as they can reconcile against their local datastore. Agents run a local load-balancer that caches server addresses across restarts, so they can always start up as long as at least one of the previously-known servers are up.
Steps To Reproduce:
Create a cluster with 3 etcd, 2 control-plane, 1 agent. Use the first etcd node as the --server address for the other nodes.
Stop k3s on the first etcd node.
Note that etcd nodes can be restarted successfully, and agents can be restarted successfully, but control-plane nodes cannot. They fail with: level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://ETCD-SERVER-1:6443/cacerts\": dial tcp ETCD-SERVER-1:6443: connect: connection refused"
Expected behavior:
All nodes can be restarted as long as at least one previously-known server is available.
Actual behavior:
control-plane nodes fail to start if their join server is unavailable.
Additional context / logs:
Our docs say to use a fixed registration address that is backed by multiple servers, but that guidance is not always followed.
All other configurations are able to start up with at least one previously-discovered server available, this should work for control-plane nodes as well.
The text was updated successfully, but these errors were encountered:
brandond
changed the title
control-plane-only nodes cannot start if registrations address is unreachable
control-plane-only nodes cannot start if join server is unreachable
Nov 17, 2024
Environmental Info:
K3s Version:
n/a
Node(s) CPU architecture, OS, and Version:
n/a
Cluster Configuration:
cluster with dedicated etcd-only and control-plane-only servers
Describe the bug:
Control-plane nodes are uniquely disadvantaged in their startup, and will fail to start if the server address is unreachable. Etcd nodes can always start up, as they can reconcile against their local datastore. Agents run a local load-balancer that caches server addresses across restarts, so they can always start up as long as at least one of the previously-known servers are up.
Steps To Reproduce:
--server
address for the other nodes.level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://ETCD-SERVER-1:6443/cacerts\": dial tcp ETCD-SERVER-1:6443: connect: connection refused"
Expected behavior:
All nodes can be restarted as long as at least one previously-known server is available.
Actual behavior:
control-plane nodes fail to start if their join server is unavailable.
Additional context / logs:
Our docs say to use a fixed registration address that is backed by multiple servers, but that guidance is not always followed.
All other configurations are able to start up with at least one previously-discovered server available, this should work for control-plane nodes as well.
The text was updated successfully, but these errors were encountered: