OCP 4.1 deployment on Azure Cloud using User provisioned Infrastructure

Architecture:

When using this method, you can:

Specify the number of masters and workers you want to provision
Change Network Security Group rules in order to lock down the ingress access to the cluster
Change Infrastructure component names
Add tags

This Terraform based aproach will split VMs accross 3 Azure Availability Zones and will use 2 Zone Redundant Load Balancers (1 Public facing to serve OCP routers and api and 1 Private to serve api-int)

Please see the topology bellow:

Deployment can be split into 4 steps:

Create Control Plane (masters) and Surrounding Infrastructure (LB,DNS,VNET etc.)
Destroy Bootstrap VM
Set the default Ingress controller to type HostNetwork
Create Compute (worker) nodes

Prereqs:

This method uses the following tools:

terraform >= 0.12
openshift-cli
git
jq (optional)

NOTE: Free Trial account is not enough and Pay As You Go is recommended with increased quota for vCPU:
https://blogs.msdn.microsoft.com/girishp/2015/09/20/increasing-core-quota-limits-in-azure/

Preparation

Prepare Azure Cloud for Openshift installation:
https://github.com/openshift/installer/tree/master/docs/user/azure

You need to follow this Installation section as well:
https://github.com/openshift/installer/blob/master/docs/user/azure/install.md#setup-your-red-hat-enterprise-linux-coreos-images

Clone this repository

  $> git clone https://github.com/JuozasA/ocp4-azure-upi.git
  $> cd ocp4-azure-upi

Initialize Terraform working directories (current and worker):

$> terraform init
$> cd worker
$> terraform init
$> cd ../

Download openshift-install binary and get the pull-secret from:
https://cloud.redhat.com/openshift/install/azure/installer-provisioned
Copy openshift-install binary to /usr/local/bin directory

cp openshift-install /usr/local/bin/

Generate install config files:

$> openshift-install create install-config --dir=ignition-files

$> ./openshift-install create install-config --dir=ignition-files
? SSH Public Key /home/user_id/.ssh/id_rsa.pub
? Platform azure
? azure subscription id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure tenant id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure service principal client id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure service principal client secret xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? Region <Azure region>
? Base Domain example.com
? Cluster Name <cluster name. this will be used to create subdomain, e.g. test.example.com>
? Pull Secret [? for help]

6.1. Edit the install-config.yaml file to set the number of compute, or worker, replicas to 0, as shown in the following compute stanza:

compute:
- hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 0

Generate manifests:

$> openshift-install create manifests --dir=ignition-files

7.1. Remove the files that define the control plane machines:

$> rm -f ignition-files/openshift/99_openshift-cluster-api_master-machines-*

7.2. Remove the Kubernetes manifest files that define the worker machines:

$> rm -f ignition-files/openshift/99_openshift-cluster-api_worker-machineset-*

Because you create and manage the worker machines yourself, you do not need to initialize these machines.

Obtain the Ignition config files:

$> openshift-install create ignition-configs --dir=ignition-files

Extract the infrastructure name from the Ignition config file metadata, run one of the following commands:

$> jq -r .infraID ignition-files/metadata.json
$> egrep -o 'infraID.*,' ignition-files/metadata.json

Open terraform.tfvars file and fill in the variables:

azure_subscription_id = ""
azure_client_id = ""
azure_client_secret = ""
azure_tenant_id = ""
azure_bootstrap_vm_type = "Standard_D4s_v3" <- Size of the bootstrap VM
azure_master_vm_type = "Standard_D4s_v3" <- Size of the Master VMs
azure_master_root_volume_size = 64 <- Disk size for Master VMs
azure_image_id = "/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage" <- Location of coreos image
azure_region = "uksouth" <- Azure region (the one you've selected when creating install-config)
azure_base_domain_resource_group_name = "ocp-cluster" <- Resource group for base domain and rhcos vhd blob.
cluster_id = "openshift-lnkh2" <- infraID parameter extracted from metadata.json (step 9.)
base_domain = "example.com"
machine_cidr = "10.0.0.0/16" <- Address range which will be used for VMs
master_count = 3 <- number of masters

Open worker/terraform.tfvars and fill in information there as well.

Start OCP v4.1 Deployment

You can either run the upi-ocp-install.sh script or run the steps manually:

Run the installation script:

$> ./upi-ocp-install.sh

After Control Plane is deployed, script will replace the default Ingress Controller of type LoadBalancerService to type HostNetwork. This will disable the creation of Public facing Azure Load Balancer and will allow to have a custom Network Security Rules which won't be overwritten by Kubernetes.

Once this is done, it will continue with Compute nodes deployment.

Manual approach:

2.1. Initialize Terraform directory:

terraform init

2.2. Run Terraform Plan and check what resources will be provisioned:

terraform plan

2.3. Once ready, run Terraform apply to provision Control plane resources:

terraform apply -auto-approve

2.4. Once Terraform job is finished, run openshift-install. It will check when the bootstraping is finished.

openshift-install wait-for bootstrap-complete --dir=ignition-files

2.5. Once the bootstraping is finished, export kubeconfig environment variable and replace the default Ingress Controller object with with the one having endpointPublishingStrategy of type HostNetwork. This will disable the creation of Public facing Azure Load Balancer and will allow to have a custom Network Security Rules which won't be overwritten by Kubernetes.

export KUBECONFIG=$(pwd)/ignition-files/auth/kubeconfig
oc delete ingresscontroller default -n openshift-ingress-operator
oc create -f ingresscontroller-default.yaml

2.6. Since we dont need bootstrap VM anymore, we can remove it:

terraform destroy -target=module.bootstrap -auto-approve

2.7. Now we can continue with Compute nodes provisioning:

cd worker
terraform init 
terraform plan
terraform apply -auto-approve
cd ../

2.8. Since we are provisioning Compute nodes manually, we need to approve kubelet CSRs:

worker_count=`cat worker/terraform.tfvars | grep worker_count | awk '{print $3}'`
while [ $(oc get csr | grep worker | grep Approved | wc -l) != $worker_count ]; do
	oc get csr -o json | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
	sleep 3
done

2.9. Check openshift-ingress service type (it should be type: ClusterIP):

oc get svc -n openshift-ingress
 NAME                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                   AGE
 router-internal-default   *ClusterIP*   172.30.72.53   <none>        80/TCP,443/TCP,1936/TCP   37m

2.10. Wait for installation to be completed. Run openshift-install command:

openshift-install wait-for install-complete --dir=ignition-files

Scale Up

In order to add aditional worker node, use terraform scripts in scaleup directory.

Fill in other information in terraform vars:

azure_subscription_id = ""
azure_client_id = ""
azure_client_secret = ""
azure_tenant_id = ""
azure_worker_vm_type = "Standard_D2s_v3"
azure_worker_root_volume_size = 64
azure_image_id = "/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage"
azure_region = "uksouth"
cluster_id = "openshift-lnkh2"

Run terraform init and terraform apply commands:

$> cd scaleup
$> terraform init
$> terraform apply

It will ask you to provide the Azure Availability Zone number where you would like to deploy new node and to provide the worker node number (if it is 4th node, then the number is 3 [indexing starts from 0 rather than 1])

Approving server certificates for nodes

To allow Kube APIServer to communicate with the kubelet running on nodes for logs, rsh etc. The administrator needs to approve the CSR generated by each kubelet.

You can approve all Pending CSR requests using:

oc get csr -o json | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

Change Ingress controller type on already provisioned cluster

Works for IPI and UPI

You need to create Load Balancer which will serve routers and add DNS records to forward *.apps and *.apps.<clustername> to Load Balancer frontend, or use existing Public LB (for control plane) and configure so it forwards the traffic from 443 and 80 to worker nodes.

You can check the kubernetes Load Balancer for configuration example, but Health Check probe will be tcp on ports 80 or 443 instead of "NodePort"/healthz)

Run disable-loadbalancer-service.sh:

./disable-loadbalancer-service.sh

Check if router service is changed to ClusterIP and kubernetes LB is destroyed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OCP 4.1 deployment on Azure Cloud using User provisioned Infrastructure

Architecture:

Prereqs:

Preparation

Start OCP v4.1 Deployment

Scale Up

Change Ingress controller type on already provisioned cluster

Files

README.md

Latest commit

History

README.md

File metadata and controls

OCP 4.1 deployment on Azure Cloud using User provisioned Infrastructure

Architecture:

Prereqs:

Preparation

Start OCP v4.1 Deployment

Scale Up

Change Ingress controller type on already provisioned cluster