diff --git a/docs/modules/ROOT/pages/explanations/decisions/managed-machine-sets-cloudscale.adoc b/docs/modules/ROOT/pages/explanations/decisions/managed-machine-sets-cloudscale.adoc new file mode 100644 index 00000000..113262cf --- /dev/null +++ b/docs/modules/ROOT/pages/explanations/decisions/managed-machine-sets-cloudscale.adoc @@ -0,0 +1,66 @@ += Managed MachineSets for Cloudscale Provider + +== Problem + +We created a cloudscale Machine-API provider for OpenShift 4 as decided in https://kb.vshn.ch/appuio-cloud/explanation/decisions/machine-api.html[Custom Machine API Provider]. +The provider allows to managed MachineSets for all kind of nodes in an OpenShift 4 cluster. +The provider runs on the control plane nodes but we have not yet found feasible way to run it on bootstrap nodes. +Some configuration is still Puppet or Terraform based. + +=== Goals + +* Frictionless management of nodes +* Shorten cluster installation time + +=== Non-goals + +* More general centralized management of OpenShift 4 clusters + +== Proposals + +=== Option 1: Manage worker nodes + +We only manage worker nodes with the Machine-API provider. +After installing the control-plane nodes, the infra nodes and any additional nodes (for example, storage nodes) create a MachineSet for the worker nodes. + +This allows us to have the required customer requested AutoScale feature and helps us by being able to easily replace failing worker nodes. +It does not help us with replacing other node types, such as infra nodes. + +=== Option 2: Manage all nodes except control plane nodes + +We manage all nodes except the control plane nodes with the Machine-API provider. +After installing the control-plane nodes, the infra nodes and any additional nodes (for example, storage nodes) are scaled up from a MachineSet. + +This allows us to have the required customer requested AutoScale feature and helps us by being able to easily replace failing nodes. + +Control plane nodes are not managed by the Machine-API provider because they are not expected to be replaced often. +Control plane nodes need some configuration in the VSHN DNS zone and can't be that easily replaced anyways. +There is no easy and intuitive way to bootstrap the control plane nodes with the Machine-API provider since the provider itself is running on the control plane nodes. + +Some caution has to be taken to follow correct node replacement procedures such as updating the router backend configuration for infra nodes or rebalancing the storage nodes. + +Router backend configuration will need to be automated, independently of this issue, as soon as we rollout the new cloudscale load balancers. + +=== Option 3: Manage all nodes + +We manage all nodes with the Machine-API provider. +The control plane nodes are managed by the Machine-API provider as well. +We find a way to run the provider on the bootstrap nodes or on the engineers device. + +This allows us to have the required customer requested AutoScale feature and helps us by being able to easily replace failing nodes. + +Replacing control plane nodes has been tested and just works, thanks to PodDisruptionBudgets in the OpenShift 4 distribution. +Some caution has to be taken to update internal VSHN DNS zone configuration to the new control plane nodes. + +We most likely can replace the DNS zone configuration after we introduced the new cloudscale load balancers. + +== Decision + +We decided to go with option 2: Manage all nodes except control plane nodes. + +== Rationale + +We decided to go with option 2 because it allows us to have the required customer requested AutoScale feature and helps us by being able to easily replace most types of failing nodes. +Since the provider is farley new we want to start with a smaller scope and expand it later on. +Setting up control plane nodes with a provider is not straight forward. +With the introduction of the new cloudscale load balancers we might revisit this decision. diff --git a/docs/modules/ROOT/partials/nav.adoc b/docs/modules/ROOT/partials/nav.adoc index a07b3769..8460588d 100644 --- a/docs/modules/ROOT/partials/nav.adoc +++ b/docs/modules/ROOT/partials/nav.adoc @@ -245,6 +245,7 @@ * Decisions ** xref:oc4:ROOT:explanations/decisions/machine-api.adoc[] +** xref:oc4:ROOT:explanations/decisions/managed-machine-sets-cloudscale.adoc[] ** xref:oc4:ROOT:explanations/decisions/maintenance-trigger.adoc[] ** xref:oc4:ROOT:explanations/decisions/maintenance-alerts.adoc[] ** xref:oc4:ROOT:explanations/decisions/syn-argocd-sharing.adoc[]