When using this method, you can:
- Specify the number of masters and workers you want to provision
- Change Network Security Group rules in order to lock down the ingress access to the cluster
- Change Infrastructure component names
- Add tags
This Terraform based aproach will split VMs accross 3 Azure Availability Zones and will use 2 Zone Redundant Load Balancers (1 Public facing to serve OCP routers and api and 1 Private to serve api-int)
Please see the topology bellow:
Deployment can be split into 4 steps:
- Create Control Plane (masters) and Surrounding Infrastructure (LB,DNS,VNET etc.)
- Destroy Bootstrap VM
- Set the default Ingress controller to type HostNetwork
- Create Compute (worker) nodes
This method uses the following tools:
- terraform >= 0.12
- openshift-cli
- git
- jq (optional)
NOTE: Free Trial account is not enough and Pay As You Go is recommended with increased quota for vCPU:
https://blogs.msdn.microsoft.com/girishp/2015/09/20/increasing-core-quota-limits-in-azure/
- Prepare Azure Cloud for Openshift installation:
https://github.com/openshift/installer/tree/master/docs/user/azure
You need to follow this Installation section as well:
https://github.com/openshift/installer/blob/master/docs/user/azure/install.md#setup-your-red-hat-enterprise-linux-coreos-images
- Clone this repository
$> git clone https://github.com/JuozasA/ocp4-azure-upi.git
$> cd ocp4-azure-upi
- Initialize Terraform working directories (current and worker):
$> terraform init
$> cd worker
$> terraform init
$> cd ../
-
Download openshift-install binary and get the pull-secret from:
https://cloud.redhat.com/openshift/install/azure/installer-provisioned -
Copy openshift-install binary to
/usr/local/bin
directory
cp openshift-install /usr/local/bin/
- Generate install config files:
$> openshift-install create install-config --dir=ignition-files
$> ./openshift-install create install-config --dir=ignition-files
? SSH Public Key /home/user_id/.ssh/id_rsa.pub
? Platform azure
? azure subscription id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure tenant id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure service principal client id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? azure service principal client secret xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
? Region <Azure region>
? Base Domain example.com
? Cluster Name <cluster name. this will be used to create subdomain, e.g. test.example.com>
? Pull Secret [? for help]
6.1. Edit the install-config.yaml file to set the number of compute, or worker, replicas to 0, as shown in the following compute stanza:
compute:
- hyperthreading: Enabled
name: worker
platform: {}
replicas: 0
- Generate manifests:
$> openshift-install create manifests --dir=ignition-files
7.1. Remove the files that define the control plane machines:
$> rm -f ignition-files/openshift/99_openshift-cluster-api_master-machines-*
7.2. Remove the Kubernetes manifest files that define the worker machines:
$> rm -f ignition-files/openshift/99_openshift-cluster-api_worker-machineset-*
Because you create and manage the worker machines yourself, you do not need to initialize these machines.
- Obtain the Ignition config files:
$> openshift-install create ignition-configs --dir=ignition-files
- Extract the infrastructure name from the Ignition config file metadata, run one of the following commands:
$> jq -r .infraID ignition-files/metadata.json
$> egrep -o 'infraID.*,' ignition-files/metadata.json
- Open terraform.tfvars file and fill in the variables:
azure_subscription_id = ""
azure_client_id = ""
azure_client_secret = ""
azure_tenant_id = ""
azure_bootstrap_vm_type = "Standard_D4s_v3" <- Size of the bootstrap VM
azure_master_vm_type = "Standard_D4s_v3" <- Size of the Master VMs
azure_master_root_volume_size = 64 <- Disk size for Master VMs
azure_image_id = "/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage" <- Location of coreos image
azure_region = "uksouth" <- Azure region (the one you've selected when creating install-config)
azure_base_domain_resource_group_name = "ocp-cluster" <- Resource group for base domain and rhcos vhd blob.
cluster_id = "openshift-lnkh2" <- infraID parameter extracted from metadata.json (step 9.)
base_domain = "example.com"
machine_cidr = "10.0.0.0/16" <- Address range which will be used for VMs
master_count = 3 <- number of masters
- Open
worker/terraform.tfvars
and fill in information there as well.
You can either run the upi-ocp-install.sh
script or run the steps manually:
- Run the installation script:
$> ./upi-ocp-install.sh
After Control Plane is deployed, script will replace the default Ingress Controller of type
LoadBalancerService
to typeHostNetwork
. This will disable the creation of Public facing Azure Load Balancer and will allow to have a custom Network Security Rules which won't be overwritten by Kubernetes.
Once this is done, it will continue with Compute nodes deployment.
- Manual approach:
2.1. Initialize Terraform directory:
terraform init
2.2. Run Terraform Plan and check what resources will be provisioned:
terraform plan
2.3. Once ready, run Terraform apply to provision Control plane resources:
terraform apply -auto-approve
2.4. Once Terraform job is finished, run openshift-install
. It will check when the bootstraping is finished.
openshift-install wait-for bootstrap-complete --dir=ignition-files
2.5. Once the bootstraping is finished, export kubeconfig
environment variable and replace the default
Ingress Controller object with with the one having endpointPublishingStrategy
of type HostNetwork
. This will disable the creation of Public facing Azure Load Balancer and will allow to have a custom Network Security Rules which won't be overwritten by Kubernetes.
export KUBECONFIG=$(pwd)/ignition-files/auth/kubeconfig
oc delete ingresscontroller default -n openshift-ingress-operator
oc create -f ingresscontroller-default.yaml
2.6. Since we dont need bootstrap VM anymore, we can remove it:
terraform destroy -target=module.bootstrap -auto-approve
2.7. Now we can continue with Compute nodes provisioning:
cd worker
terraform init
terraform plan
terraform apply -auto-approve
cd ../
2.8. Since we are provisioning Compute nodes manually, we need to approve kubelet CSRs:
worker_count=`cat worker/terraform.tfvars | grep worker_count | awk '{print $3}'`
while [ $(oc get csr | grep worker | grep Approved | wc -l) != $worker_count ]; do
oc get csr -o json | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
sleep 3
done
2.9. Check openshift-ingress service type (it should be type: ClusterIP):
oc get svc -n openshift-ingress
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
router-internal-default *ClusterIP* 172.30.72.53 <none> 80/TCP,443/TCP,1936/TCP 37m
2.10. Wait for installation to be completed. Run openshift-install
command:
openshift-install wait-for install-complete --dir=ignition-files
In order to add aditional worker node, use terraform scripts in scaleup directory.
- Fill in other information in terraform vars:
azure_subscription_id = ""
azure_client_id = ""
azure_client_secret = ""
azure_tenant_id = ""
azure_worker_vm_type = "Standard_D2s_v3"
azure_worker_root_volume_size = 64
azure_image_id = "/resourceGroups/rhcos_images/providers/Microsoft.Compute/images/rhcostestimage"
azure_region = "uksouth"
cluster_id = "openshift-lnkh2"
- Run
terraform init
andterraform apply
commands:
$> cd scaleup
$> terraform init
$> terraform apply
It will ask you to provide the Azure Availability Zone number where you would like to deploy new node and to provide the worker node number (if it is 4th node, then the number is 3 [indexing starts from 0 rather than 1])
- Approving server certificates for nodes
To allow Kube APIServer to communicate with the kubelet running on nodes for logs, rsh etc. The administrator needs to approve the CSR generated by each kubelet.
You can approve all Pending
CSR requests using:
oc get csr -o json | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve
Works for IPI and UPI
- You need to create Load Balancer which will serve routers and add DNS records to forward
*.apps
and*.apps.<clustername>
to Load Balancer frontend, or use existing Public LB (for control plane) and configure so it forwards the traffic from 443 and 80 to worker nodes.
You can check the
kubernetes
Load Balancer for configuration example, but Health Check probe will be tcp on ports 80 or 443 instead of "NodePort"/healthz)
- Run
disable-loadbalancer-service.sh
:
./disable-loadbalancer-service.sh
- Check if router service is changed to
ClusterIP
andkubernetes
LB is destroyed.