Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-control plane - HA clusters #17909

Merged
merged 34 commits into from
Mar 6, 2024
Merged

Conversation

prezha
Copy link
Contributor

@prezha prezha commented Jan 7, 2024

fixes #17908
fixes #17509
fixes #17680
closes #17681
fixes #17685
fixes #7366

summary

improvements introduced in this pr

  • support kubernetes ha cluster topology in minikube with multiple stacked control-plane nodes and kube-vip for load-balancing between multiple apiservers using new minikube start --ha flag and new APIServerHAVIP ClusterConfig/KubernetesConfig param (reserving & taking the last ip in allocated nodes' subnet)
  • enable add/stop/start/delete non-primary control-plane nodes (see "further potential improvements" below for details)
  • introduce two new "Status"-es for minikube profile list: Degraded (with two control-plane nodes) and HAppy (three+ control-plane nodes) to indicate specific ha cluster states
  • keep two coredns instances for ha clusters (see "further potential improvements" below for details)
  • added new functionality backup/restore kubernetes & cni config for ha with ephemeral vm (iso-based) nodes, essential for ha to come up after cluster restart (see "further potential improvements" below for details)
  • refactor to increase code readability & consistency + simplify flow, remove legacy/deprecated handling/params/flags, correct existing logic to support ha, fix some bugs, reduce some unnecessary slowdowns & flakiness, etc.
  • add TestHA and reduce some flakiness in existing TestMultiNode integration tests (incl. fixing a bug with adding a new node after a node was deleted)

potential further "minikube ha cluster improvements" for other PRs (we can have a separate tracking issue for these)

  • test against k8s v1.29
  • make pr against kubernetes docs to state that minikube can be used to try & test ha cluster topology (refs: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/ and https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/)
  • add support for "v1beta4" kubeadm config, when released (refs: https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta4/ and release v1beta4 kubeadm#2890)
  • support "loosing" primary control-plane (ie, to be stopped/deleted - think: storage-provisioner that's currently "bound" to the primary cp node only and also some other current checks that rely on primary cp being "healthy")
    • in the meantime: add warning to 'minikube node delete' for primary control-plane node (in general, not just for ha)
  • CoreDNS - decrease code overhead and increase dns service availability & realiability:
    • switch to coredns-as-addon - ref: deploy our custom coredns addon #17008
      • idea: convert deployment to daemonset but only for control-plane nodes (eg, using nodeSelector), hence it will have as many coredns instances as control-plane nodes (good for both single- and ha/multi-control-plan node clusters)
  • restarting cluster without changing flags/params should "just work" and only way faster than when initially creating cluster - ie, we should avoid re-applying/overwriting changes and restarting CRs (eg, containerd could take additional ~10s, docker ~5s, etc.) - could eg, expand on configs backup()/restore() introduced in this pr (required for ha)
    • speed up ha/multinode cluster start (create and restart): handle nodes concurrently
    • revisit if adding minikube node add --control-plane flag to support upgrading non-ha to ha cluster would be beneficial for our users

refs:

some after examples (more are covered in TestHA integration tests)

$ minikube start --ha -p ha-docker

$ minikube profile list

|-----------|-----------|---------|--------------|------|---------|---------|-------|--------|
|  Profile  | VM Driver | Runtime |      IP      | Port | Version | Status  | Nodes | Active |
|-----------|-----------|---------|--------------|------|---------|---------|-------|--------|
| ha-docker | docker    | docker  | 192.168.49.2 | 8443 | v1.28.4 | Stopped |     1 |        |
|-----------|-----------|---------|--------------|------|---------|---------|-------|--------|

$ minikube profile list

|-----------|-----------|---------|----------------|------|---------|----------|-------|--------|
|  Profile  | VM Driver | Runtime |       IP       | Port | Version |  Status  | Nodes | Active |
|-----------|-----------|---------|----------------|------|---------|----------|-------|--------|
| ha-docker | docker    | docker  | 192.168.49.254 | 8443 | v1.28.4 | Degraded |     2 |        |
|-----------|-----------|---------|----------------|------|---------|----------|-------|--------|

$ minikube profile list

|-----------|-----------|---------|----------------|------|---------|--------|-------|--------|
|  Profile  | VM Driver | Runtime |       IP       | Port | Version | Status | Nodes | Active |
|-----------|-----------|---------|----------------|------|---------|--------|-------|--------|
| ha-docker | docker    | docker  | 192.168.49.254 | 8443 | v1.28.4 | HAppy  |     3 |        |
|-----------|-----------|---------|----------------|------|---------|--------|-------|--------|

$ kubectl get nodes -owide

NAME            STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
ha-docker       Ready    control-plane   80s   v1.28.4   192.168.49.2   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7
ha-docker-m02   Ready    control-plane   61s   v1.28.4   192.168.49.3   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7
ha-docker-m03   Ready    control-plane   26s   v1.28.4   192.168.49.4   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7

$ kubectl get all -A -owide

NAMESPACE     NAME                                        READY   STATUS    RESTARTS      AGE   IP             NODE            NOMINATED NODE   READINESS GATES
kube-system   pod/coredns-5dd5756b68-t75p6                1/1     Running   1 (61s ago)   75s   10.244.0.3     ha-docker       <none>           <none>
kube-system   pod/coredns-5dd5756b68-tctss                1/1     Running   1 (61s ago)   75s   10.244.0.2     ha-docker       <none>           <none>
kube-system   pod/etcd-ha-docker                          1/1     Running   0             86s   192.168.49.2   ha-docker       <none>           <none>
kube-system   pod/etcd-ha-docker-m02                      1/1     Running   0             71s   192.168.49.3   ha-docker-m02   <none>           <none>
kube-system   pod/etcd-ha-docker-m03                      1/1     Running   0             36s   192.168.49.4   ha-docker-m03   <none>           <none>
kube-system   pod/kindnet-ckfnd                           1/1     Running   0             76s   192.168.49.2   ha-docker       <none>           <none>
kube-system   pod/kindnet-kf6t5                           1/1     Running   0             71s   192.168.49.3   ha-docker-m02   <none>           <none>
kube-system   pod/kindnet-lsrgw                           1/1     Running   0             36s   192.168.49.4   ha-docker-m03   <none>           <none>
kube-system   pod/kube-apiserver-ha-docker                1/1     Running   0             86s   192.168.49.2   ha-docker       <none>           <none>
kube-system   pod/kube-apiserver-ha-docker-m02            1/1     Running   0             70s   192.168.49.3   ha-docker-m02   <none>           <none>
kube-system   pod/kube-apiserver-ha-docker-m03            1/1     Running   0             35s   192.168.49.4   ha-docker-m03   <none>           <none>
kube-system   pod/kube-controller-manager-ha-docker       1/1     Running   0             86s   192.168.49.2   ha-docker       <none>           <none>
kube-system   pod/kube-controller-manager-ha-docker-m02   1/1     Running   0             70s   192.168.49.3   ha-docker-m02   <none>           <none>
kube-system   pod/kube-controller-manager-ha-docker-m03   1/1     Running   0             35s   192.168.49.4   ha-docker-m03   <none>           <none>
kube-system   pod/kube-proxy-8zwms                        1/1     Running   0             71s   192.168.49.3   ha-docker-m02   <none>           <none>
kube-system   pod/kube-proxy-kz52h                        1/1     Running   0             36s   192.168.49.4   ha-docker-m03   <none>           <none>
kube-system   pod/kube-proxy-rnq7m                        1/1     Running   0             76s   192.168.49.2   ha-docker       <none>           <none>
kube-system   pod/kube-scheduler-ha-docker                1/1     Running   0             86s   192.168.49.2   ha-docker       <none>           <none>
kube-system   pod/kube-scheduler-ha-docker-m02            1/1     Running   0             70s   192.168.49.3   ha-docker-m02   <none>           <none>
kube-system   pod/kube-scheduler-ha-docker-m03            1/1     Running   0             35s   192.168.49.4   ha-docker-m03   <none>           <none>
kube-system   pod/kube-vip-ha-docker                      1/1     Running   1 (61s ago)   89s   192.168.49.2   ha-docker       <none>           <none>
kube-system   pod/kube-vip-ha-docker-m02                  1/1     Running   0             61s   192.168.49.3   ha-docker-m02   <none>           <none>
kube-system   pod/kube-vip-ha-docker-m03                  1/1     Running   0             34s   192.168.49.4   ha-docker-m03   <none>           <none>
kube-system   pod/storage-provisioner                     1/1     Running   1 (45s ago)   85s   192.168.49.2   ha-docker       <none>           <none>

NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  90s   <none>
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   86s   k8s-app=kube-dns

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS    IMAGES                                          SELECTOR
kube-system   daemonset.apps/kindnet      3         3         3       3            3           <none>                   86s   kindnet-cni   docker.io/kindest/kindnetd:v20230809-80a64d96   app=kindnet
kube-system   daemonset.apps/kube-proxy   3         3         3       3            3           kubernetes.io/os=linux   86s   kube-proxy    registry.k8s.io/kube-proxy:v1.28.4              k8s-app=kube-proxy

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                    SELECTOR
kube-system   deployment.apps/coredns   2/2     2            2           86s   coredns      registry.k8s.io/coredns/coredns:v1.10.1   k8s-app=kube-dns

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                    SELECTOR
kube-system   replicaset.apps/coredns-5dd5756b68   2         2         2       76s   coredns      registry.k8s.io/coredns/coredns:v1.10.1   k8s-app=kube-dns,pod-template-hash=5dd5756b68

$ minikube node add --control-plane --alsologtostderr -v=7 -p ha-docker

$ minikube profile list

|-----------|-----------|---------|----------------|------|---------|--------|-------|--------|
|  Profile  | VM Driver | Runtime |       IP       | Port | Version | Status | Nodes | Active |
|-----------|-----------|---------|----------------|------|---------|--------|-------|--------|
| ha-docker | docker    | docker  | 192.168.49.254 | 8443 | v1.28.4 | HAppy  |     4 |        |
|-----------|-----------|---------|----------------|------|---------|--------|-------|--------|

$ kubectl get nodes -owide

NAME            STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
ha-docker       Ready    control-plane   5m32s   v1.28.4   192.168.49.2   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7
ha-docker-m02   Ready    control-plane   5m13s   v1.28.4   192.168.49.3   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7
ha-docker-m03   Ready    control-plane   4m38s   v1.28.4   192.168.49.4   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7
ha-docker-m04   Ready    control-plane   59s     v1.28.4   192.168.49.5   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7

$ kubectl config view

apiVersion: v1
clusters:
- cluster:
    certificate-authority: /home/prezha/.minikube/ca.crt
    extensions:
    - extension:
        last-update: Sun, 07 Jan 2024 21:16:00 GMT
        provider: minikube.sigs.k8s.io
        version: v1.32.0
      name: cluster_info
    server: https://192.168.49.254:8443
  name: ha-docker
contexts:
- context:
    cluster: ha-docker
    extensions:
    - extension:
        last-update: Sun, 07 Jan 2024 21:16:00 GMT
        provider: minikube.sigs.k8s.io
        version: v1.32.0
      name: context_info
    namespace: default
    user: ha-docker
  name: ha-docker
current-context: ha-docker
kind: Config
preferences: {}
users:
- name: ha-docker
  user:
    client-certificate: /home/prezha/.minikube/profiles/ha-docker/client.crt
    client-key: /home/prezha/.minikube/profiles/ha-docker/client.key

$ minikube ssh -p ha-docker -- 'sudo /var/lib/minikube/binaries/v1.28.4/kubectl --kubeconfig=/var/lib/minikube/kubeconfig exec -ti pod/etcd-ha-docker -n kube-system -- /bin/sh -c "ETCDCTL_API=3 etcdctl member list --write-out=table --cacert=/var/lib/minikube/certs/etcd/ca.crt --cert=/var/lib/minikube/certs/etcd/server.crt --key=/var/lib/minikube/certs/etcd/server.key"'

+------------------+---------+---------------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |     NAME      |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+---------------+---------------------------+---------------------------+------------+
| 3682368a4840cc8d | started | ha-docker-m03 | https://192.168.49.4:2380 | https://192.168.49.4:2379 |      false |
| 5c0f6d84fda7c661 | started | ha-docker-m02 | https://192.168.49.3:2380 | https://192.168.49.3:2379 |      false |
| aec36adc501070cc | started |     ha-docker | https://192.168.49.2:2380 | https://192.168.49.2:2379 |      false |
| caf5013453d352bf | started | ha-docker-m04 | https://192.168.49.5:2380 | https://192.168.49.5:2379 |      false |
+------------------+---------+---------------+---------------------------+---------------------------+------------+

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 7, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: prezha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 7, 2024
@prezha
Copy link
Contributor Author

prezha commented Jan 7, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 7, 2024
@minikube-pr-bot

This comment has been minimized.

@prezha
Copy link
Contributor Author

prezha commented Jan 8, 2024

TestValidatePorts unit test failure will be fixed in pr #17906

@neolit123
Copy link
Member

neolit123 commented Jan 8, 2024

support kubernetes ha cluster topology in minikube with multiple stacked control-plane nodes and kube-vip for load-balancing between multiple apiservers using new minikube start --ha flag and new APIServerHAVIP

there is one gotcha with kubeadm 1.29 and kube-vip
https://kubernetes.slack.com/archives/C01RXPHDATB/p1703093377135499
https://kubernetes.slack.com/archives/C01RXPHDATB/p1703061589086269
https://kubernetes.slack.com/archives/C01RXPHDATB/p1704643142829679
kube-vip/kube-vip#684

as discussed, the fix can be added on the kube-vip side.

@medyagh
Copy link
Member

medyagh commented Jan 8, 2024

thank you for this PR I will look forward to review this ! thank you. @prezha

@prezha
Copy link
Contributor Author

prezha commented Jan 9, 2024

support kubernetes ha cluster topology in minikube with multiple stacked control-plane nodes and kube-vip for load-balancing between multiple apiservers using new minikube start --ha flag and new APIServerHAVIP

there is one gotcha with kubeadm 1.29 and kube-vip https://kubernetes.slack.com/archives/C01RXPHDATB/p1703093377135499 https://kubernetes.slack.com/archives/C01RXPHDATB/p1703061589086269 https://kubernetes.slack.com/archives/C01RXPHDATB/p1704643142829679 kube-vip/kube-vip#684

as discussed, the fix can be added on the kube-vip side.

thanks, @neolit123, for pointing that out, we'll look into the details

@minikube-pr-bot

This comment has been minimized.

@prezha
Copy link
Contributor Author

prezha commented Jan 9, 2024

taking into account this comment, the latest commit contains a workaround for kube-vip and k8s v1.29 [refs: changelog and issue]:

testrun

$ minikube start --ha --kubernetes-version=v1.29.0 --alsologtostderr -v=7 --driver=docker -p ha-docker-v129-2

$ kubectl --context=ha-docker-v129-2 get nodes -owide

NAME                   STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
ha-docker-v129-2       Ready    control-plane   14m   v1.29.0   192.168.67.2   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7
ha-docker-v129-2-m02   Ready    control-plane   14m   v1.29.0   192.168.67.3   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7
ha-docker-v129-2-m03   Ready    control-plane   14m   v1.29.0   192.168.67.4   <none>        Ubuntu 22.04.3 LTS   6.6.9-1-default   docker://24.0.7

$ kubectl --context=ha-docker-v129-2 get all -A -owide

NAMESPACE     NAME                                               READY   STATUS    RESTARTS      AGE   IP             NODE                   NOMINATED NODE   READINESS GATES
kube-system   pod/coredns-76f75df574-5k5v8                       1/1     Running   1 (14m ago)   15m   10.244.0.3     ha-docker-v129-2       <none>           <none>
kube-system   pod/coredns-76f75df574-7d64r                       1/1     Running   1 (14m ago)   15m   10.244.0.2     ha-docker-v129-2       <none>           <none>
kube-system   pod/etcd-ha-docker-v129-2                          1/1     Running   0             15m   192.168.67.2   ha-docker-v129-2       <none>           <none>
kube-system   pod/etcd-ha-docker-v129-2-m02                      1/1     Running   0             15m   192.168.67.3   ha-docker-v129-2-m02   <none>           <none>
kube-system   pod/etcd-ha-docker-v129-2-m03                      1/1     Running   0             14m   192.168.67.4   ha-docker-v129-2-m03   <none>           <none>
kube-system   pod/kindnet-8skwg                                  1/1     Running   0             15m   192.168.67.3   ha-docker-v129-2-m02   <none>           <none>
kube-system   pod/kindnet-d79xl                                  1/1     Running   0             14m   192.168.67.4   ha-docker-v129-2-m03   <none>           <none>
kube-system   pod/kindnet-xqlt4                                  1/1     Running   0             15m   192.168.67.2   ha-docker-v129-2       <none>           <none>
kube-system   pod/kube-apiserver-ha-docker-v129-2                1/1     Running   0             15m   192.168.67.2   ha-docker-v129-2       <none>           <none>
kube-system   pod/kube-apiserver-ha-docker-v129-2-m02            1/1     Running   0             15m   192.168.67.3   ha-docker-v129-2-m02   <none>           <none>
kube-system   pod/kube-apiserver-ha-docker-v129-2-m03            1/1     Running   0             14m   192.168.67.4   ha-docker-v129-2-m03   <none>           <none>
kube-system   pod/kube-controller-manager-ha-docker-v129-2       1/1     Running   0             15m   192.168.67.2   ha-docker-v129-2       <none>           <none>
kube-system   pod/kube-controller-manager-ha-docker-v129-2-m02   1/1     Running   0             15m   192.168.67.3   ha-docker-v129-2-m02   <none>           <none>
kube-system   pod/kube-controller-manager-ha-docker-v129-2-m03   1/1     Running   0             14m   192.168.67.4   ha-docker-v129-2-m03   <none>           <none>
kube-system   pod/kube-proxy-2qj2b                               1/1     Running   0             14m   192.168.67.4   ha-docker-v129-2-m03   <none>           <none>
kube-system   pod/kube-proxy-h8zdd                               1/1     Running   0             15m   192.168.67.2   ha-docker-v129-2       <none>           <none>
kube-system   pod/kube-proxy-n5fs2                               1/1     Running   0             15m   192.168.67.3   ha-docker-v129-2-m02   <none>           <none>
kube-system   pod/kube-scheduler-ha-docker-v129-2                1/1     Running   0             15m   192.168.67.2   ha-docker-v129-2       <none>           <none>
kube-system   pod/kube-scheduler-ha-docker-v129-2-m02            1/1     Running   0             15m   192.168.67.3   ha-docker-v129-2-m02   <none>           <none>
kube-system   pod/kube-scheduler-ha-docker-v129-2-m03            1/1     Running   0             14m   192.168.67.4   ha-docker-v129-2-m03   <none>           <none>
kube-system   pod/kube-vip-ha-docker-v129-2                      1/1     Running   0             15m   192.168.67.2   ha-docker-v129-2       <none>           <none>
kube-system   pod/kube-vip-ha-docker-v129-2-m02                  1/1     Running   0             15m   192.168.67.3   ha-docker-v129-2-m02   <none>           <none>
kube-system   pod/kube-vip-ha-docker-v129-2-m03                  1/1     Running   0             14m   192.168.67.4   ha-docker-v129-2-m03   <none>           <none>
kube-system   pod/storage-provisioner                            1/1     Running   1 (14m ago)   15m   192.168.67.2   ha-docker-v129-2       <none>           <none>

NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  15m   <none>
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   15m   k8s-app=kube-dns

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS    IMAGES                                          SELECTOR
kube-system   daemonset.apps/kindnet      3         3         3       3            3           <none>                   15m   kindnet-cni   docker.io/kindest/kindnetd:v20230809-80a64d96   app=kindnet
kube-system   daemonset.apps/kube-proxy   3         3         3       3            3           kubernetes.io/os=linux   15m   kube-proxy    registry.k8s.io/kube-proxy:v1.29.0              k8s-app=kube-proxy

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                    SELECTOR
kube-system   deployment.apps/coredns   2/2     2            2           15m   coredns      registry.k8s.io/coredns/coredns:v1.11.1   k8s-app=kube-dns

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                    SELECTOR
kube-system   replicaset.apps/coredns-76f75df574   2         2         2       15m   coredns      registry.k8s.io/coredns/coredns:v1.11.1   k8s-app=kube-dns,pod-template-hash=76f75df574

$ minikube profile list

|------------------|-----------|---------|----------------|------|---------|----------|-------|--------|
|     Profile      | VM Driver | Runtime |       IP       | Port | Version |  Status  | Nodes | Active |
|------------------|-----------|---------|----------------|------|---------|----------|-------|--------|
| ha-docker-v129-2 | docker    | docker  | 192.168.67.254 | 8443 | v1.29.0 | HAppy    |     3 |        |
|------------------|-----------|---------|----------------|------|---------|----------|-------|--------|

TestHA passed - logs:

TestHA-v1.29.0.txt

//cc: @neolit123

@minikube-pr-bot

This comment has been minimized.

@prezha
Copy link
Contributor Author

prezha commented Jan 10, 2024

i wonder why prow/pull-minikube-build doesn't like go 1.21:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/minikube/17909/pull-minikube-build/1744866330323980288#1:build-log.txt%3A191-193

go 1.21

GO_VERSION ?= 1.21.5

@spowelljr
Copy link
Member

@prezha The issue is that the prow image was built with Go 1.20.6 but the slices package was added in Go 1.21, I'll build a new version of the prow

https://pkg.go.dev/path:
"The path package should only be used for paths separated by forward
slashes, such as the paths in URLs. This package does not deal with
Windows paths with drive letters or backslashes; to manipulate operating
system paths, use the path/filepath package."

- user os can be eg, windows => use 'filepath' to reference local paths
- kic/iso use linux os => use 'path' to reference "internal" paths
 (independently of user's os)
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 10, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 10, 2024
@prezha
Copy link
Contributor Author

prezha commented Jan 10, 2024

@prezha The issue is that the prow image was built with Go 1.20.6 but the slices package was added in Go 1.21, I'll build a new version of the prow

@spowelljr thanks for clarifying!

@medyagh
Copy link
Member

medyagh commented Feb 15, 2024

@prezha would u please take a look at the HA tests failing on all the drivers (except Qemu)



These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
Hyper-V_Windows TestFunctional/parallel (gopogh) n/a
Hyper-V_Windows TestHA/serial/AddWorkerNode (gopogh) n/a
Hyper-V_Windows TestHA/serial/DeployApp (gopogh) n/a
Hyper-V_Windows TestHA/serial/HAppyAfterClusterStart (gopogh) n/a
Hyper-V_Windows TestHA/serial/PingHostFromPods (gopogh) n/a
Hyper-V_Windows TestHA/serial/StartCluster (gopogh) n/a
KVM_Linux_crio TestHA/serial/RestartClusterKeepsNodes (gopogh) n/a
KVM_Linux_crio TestHA/serial/RestartSecondaryNode (gopogh) n/a
KVM_Linux_crio TestHA/serial/StopCluster (gopogh) n/a
KVM_Linux_crio TestHA/serial/StopSecondaryNode (gopogh) n/a

test/integration/ha_test.go Outdated Show resolved Hide resolved
@prezha
Copy link
Contributor Author

prezha commented Feb 25, 2024

@prezha would u please take a look at the HA tests failing on all the drivers (except Qemu)

@medyagh not sure where you see that: all the ha tests ran against the previous commit (cc4dc8c) have passed but for Hyper-V_Windows, partially KVM_Linux_crio (vm node shutdown timeout) and QEMU_macOS (probably an irrelevant issue: 'Failed to connect to "/var/run/socket_vmnet": Connection refused')?

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

test/integration/ha_test.go Outdated Show resolved Hide resolved
@medyagh
Copy link
Member

medyagh commented Mar 4, 2024

@prezha can you plz take a look at the KVM test failures for the HA tests? https://storage.googleapis.com/minikube-builds/logs/17909/33398/KVM_Linux_containerd.html

@medyagh
Copy link
Member

medyagh commented Mar 4, 2024

/retest-this-please

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17909) |
+----------------+----------+---------------------+
| minikube start | 50.7s    | 51.3s               |
| enable ingress | 25.0s    | 23.8s               |
+----------------+----------+---------------------+

Times for minikube start: 51.9s 49.3s 53.2s 49.5s 49.7s
Times for minikube (PR 17909) start: 50.5s 53.6s 51.8s 49.5s 50.9s

Times for minikube ingress: 26.2s 24.0s 26.0s 22.7s 26.0s
Times for minikube (PR 17909) ingress: 23.7s 22.6s 23.5s 22.5s 26.6s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17909) |
+----------------+----------+---------------------+
| minikube start | 23.5s    | 23.9s               |
| enable ingress | 20.6s    | 20.2s               |
+----------------+----------+---------------------+

Times for minikube start: 24.7s 23.8s 24.9s 21.6s 22.6s
Times for minikube (PR 17909) start: 25.1s 21.8s 22.1s 25.9s 24.9s

Times for minikube ingress: 20.8s 20.8s 20.8s 19.8s 20.8s
Times for minikube (PR 17909) ingress: 20.3s 20.3s 19.8s 20.3s 20.3s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17909) |
+----------------+----------+---------------------+
| minikube start | 22.1s    | 23.6s               |
| enable ingress | 34.3s    | 33.2s               |
+----------------+----------+---------------------+

Times for minikube start: 21.6s 20.3s 23.8s 23.5s 21.2s
Times for minikube (PR 17909) start: 23.9s 23.8s 24.9s 24.4s 20.8s

Times for minikube ingress: 48.3s 32.3s 30.3s 30.3s 30.3s
Times for minikube (PR 17909) ingress: 45.8s 29.8s 29.8s 30.8s 29.8s

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
Hyperkit_macOS TestMutliControlPlane/serial/AddSecondaryNode (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/AddWorkerNode (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/CopyFile (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/DegradedAfterClusterRestart (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/DegradedAfterControlPlaneNodeStop (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/DegradedAfterSecondaryNodeDelete (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/DeleteSecondaryNode (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/DeployApp (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/HAppyAfterClusterStart (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/HAppyAfterSecondaryNodeAdd (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/HAppyAfterSecondaryNodeRestart (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/NodeLabels (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/PingHostFromPods (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/RestartCluster (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/RestartClusterKeepsNodes (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/RestartSecondaryNode (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/StartCluster (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/StopCluster (gopogh) n/a
Hyperkit_macOS TestMutliControlPlane/serial/StopSecondaryNode (gopogh) n/a
KVM_Linux_crio TestMutliControlPlane/serial/RestartClusterKeepsNodes (gopogh) n/a
KVM_Linux_crio TestMutliControlPlane/serial/RestartSecondaryNode (gopogh) n/a
KVM_Linux_crio TestMutliControlPlane/serial/StopCluster (gopogh) n/a
KVM_Linux_crio TestMutliControlPlane/serial/StopSecondaryNode (gopogh) n/a
Docker_Linux_containerd_arm64 TestStartStop/group/old-k8s-version/serial/SecondStart (gopogh) 0.00 (chart)
KVM_Linux_crio TestFunctional/parallel/NodeLabels (gopogh) 0.63 (chart)
KVM_Linux_crio TestFunctional/serial/KubectlGetPods (gopogh) 0.63 (chart)
KVM_Linux_crio TestFunctional/serial/MinikubeKubectlCmd (gopogh) 0.63 (chart)
KVM_Linux_crio TestFunctional/serial/MinikubeKubectlCmdDirectly (gopogh) 0.63 (chart)
KVM_Linux_crio TestFunctional/serial/SoftStart (gopogh) 0.63 (chart)
Docker_Linux_crio_arm64 TestScheduledStopUnix (gopogh) 1.23 (chart)
More tests... Continued...

Too many tests failed - See test logs for more details.

To see the flake rates of all tests by environment, click here.

@medyagh
Copy link
Member

medyagh commented Mar 6, 2024

do you mind adding some details in the PR description how does the Load Balancing the control plane work (for refernce for future)

@medyagh medyagh changed the title support kubernetes ha cluster topology in minikube (multi-control plane) Support multi-control plane - HA clusters Mar 6, 2024
@medyagh
Copy link
Member

medyagh commented Mar 6, 2024

Thank you @prezha for this PR this been long waited feature for minikube and I look forward for us to iron the follow ups to this PR and fix up the rough edges before the release before KubeCon

Great work

@medyagh medyagh merged commit 8ec4c89 into kubernetes:master Mar 6, 2024
25 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
7 participants