Restored backup results in wrong control plane node IP #3498
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/cluster-management
Denotes a PR or issue as being assigned to SIG Cluster Management.
What happened?
I'm trying to build a recovery procedure for our clusters, involving 3 control plane nodes at Hetzner, according to the manual cluster recovery guide over at https://docs.kubermatic.com/kubeone/v1.9/guides/manual-cluster-recovery/
In each of my tries this results in one control plane not being able to schedule jobs. The node overview looks like this (removed some columns for better readability):
Notice the internal ip of cp1 is wrong and clashes with cp2. Since cloud providers assign ip's dynamically, they were assigned differently this time and 10.0.0.5 was actually the OLD ip of cp1 before the restore.
However, in last task of the etcd restore process, the correct NEW ip 10.0.0.4 for cp1 was provided:
The following
kubeone apply -y -m kubeone.yaml -t tf.json -c credentials.yml -v
will get stuck infinitely while waiting for machine-controller to start up:Here as well, all jobs on cp1 show the wrong ip.
The only way out of this situation is to throw away cp1 again and reprovision it, so it also JOINS the other 2 nodes.
Expected behavior
I would exepect kubeone to initialize the first control plane with the correct ip address in case of a restored etcd snapshot.
What KubeOne version are you using?
Provide your KubeOneCluster manifest here (if applicable)
What cloud provider are you running on?
Hetzner
What operating system are you running in your cluster?
Ubuntu 24.04
Additional Information
We could also try to reassign the same ip addresses to the nodes with terraform changes, but it's not guaranteed the cloud provider will be able to assign the ip's again. Keeping the dynamic assignment at cloud provider level is preferred therefore.
The text was updated successfully, but these errors were encountered: