Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control-plane-only nodes cannot start if join server is unreachable #11333

Open
brandond opened this issue Nov 17, 2024 · 0 comments
Open

control-plane-only nodes cannot start if join server is unreachable #11333

brandond opened this issue Nov 17, 2024 · 0 comments
Assignees
Labels
kind/enhancement An improvement to existing functionality

Comments

@brandond
Copy link
Member

brandond commented Nov 17, 2024

Environmental Info:
K3s Version:
n/a

Node(s) CPU architecture, OS, and Version:
n/a

Cluster Configuration:
cluster with dedicated etcd-only and control-plane-only servers

Describe the bug:
Control-plane nodes are uniquely disadvantaged in their startup, and will fail to start if the server address is unreachable. Etcd nodes can always start up, as they can reconcile against their local datastore. Agents run a local load-balancer that caches server addresses across restarts, so they can always start up as long as at least one of the previously-known servers are up.

Steps To Reproduce:

  1. Create a cluster with 3 etcd, 2 control-plane, 1 agent. Use the first etcd node as the --server address for the other nodes.
  2. Stop k3s on the first etcd node.
  3. Note that etcd nodes can be restarted successfully, and agents can be restarted successfully, but control-plane nodes cannot. They fail with:
    level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://ETCD-SERVER-1:6443/cacerts\": dial tcp ETCD-SERVER-1:6443: connect: connection refused"

Expected behavior:
All nodes can be restarted as long as at least one previously-known server is available.

Actual behavior:
control-plane nodes fail to start if their join server is unavailable.

Additional context / logs:

Our docs say to use a fixed registration address that is backed by multiple servers, but that guidance is not always followed.

All other configurations are able to start up with at least one previously-discovered server available, this should work for control-plane nodes as well.

@brandond brandond added the kind/enhancement An improvement to existing functionality label Nov 17, 2024
@brandond brandond added this to the 2024-12 Release Cycle milestone Nov 17, 2024
@brandond brandond self-assigned this Nov 17, 2024
@brandond brandond changed the title control-plane-only nodes cannot start if registrations address is unreachable control-plane-only nodes cannot start if join server is unreachable Nov 17, 2024
@brandond brandond moved this from New to Next Up in K3s Development Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement An improvement to existing functionality
Projects
Status: Next Up
Development

No branches or pull requests

1 participant