The osdu
- Kubernetes Architecture solution template is intended to provision Managed Kubernetes resources like AKS and other core OSDU cloud managed services like Cosmos, Blob Storage and Keyvault.
We decided to split these configuration files out into a separate Terraform environment templates to mitigate the risk of Terraform accidentally deleting stateful resources types as well as have a mechanism to duplicate environments to support concepts such as data partitioning or multiple AKS Clusters.
Technical design specifications
GitOps design specifications.
Cloud administrators who are versed with both Cobalt templating and Kubernetes.
- Azure Subscription
- An available Service Principal with API Permissions granted with Admin Consent within Azure app registration. The required Azure Active Directory Graph app role is
Application.ReadWrite.OwnedBy
- Terraform and Go are locally installed
- Azure Storage Account is setup to store Terraform state
- Local environment variables are setup
- Deployment Service Principal is granted Owner level role assignment for the target Azure subscription
- Enroll as an Azure subscriber. The free trial subscription does not support enough cores to run this tutorial.
- Terraform
common_resources
environment module is provisoned to your Azure Environment - Terraform
data_resources
environment module is provisoned to your Azure Environment - Install the required common tools (kubectl, helm, and terraform). See also Required Tools. Note: this tutorial currently uses Terraform 0.12.28.
Azure environment cost ballpark estimate. This is subject to change and is driven from the resource pricing tiers configured when the template is deployed.
This document assumes one is running a current version of Ubuntu. Windows users can install the Ubuntu Terminal from the Microsoft Store. The Ubuntu Terminal enables Linux command-line utilities, including bash, ssh, and git that will be useful for the following deployment. Note: You will need the Windows Subsystem for Linux installed to use the Ubuntu Terminal on Windows.
Ensure that the required tools, are installed in your environment. Alternatively, there are scripts that will install helm
, terraform
and kubectl
. In this case, use setup_kubernetes_tools.sh
and setup_terraform.sh
. The scripts install the tools into /usr/local/bin
.
For information specific to your operating system, see the Azure CLI install guide. You can also use this script if running on a Unix based machine.
We will deploy the Bedrock environment using the empty repo and then add a Kubernetes manifest that defines a simple Web application. The change to the repo will automatically update the deployment.
To prepare the Flux manifest repository, we must:
- Create the Flux Manifest Repository
- Generate an RSA Key Pair to use as the Manifest Repository Deploy Key
- Grant Deploy Key access to the Manifest Repository
Create an empty git repository with a name that clearly signals that the repo is used for the Flux manifests. For example bedrock-deploy-demo
.
Flux requires that the git repository have at least one commit. Initialize the repo with an empty commit.
git commit --allow-empty -m "Initializing the Flux Manifest Repository"
Generate the deploy key using ssh-keygen
. The public portion of the key pair will be uploaded to GitHub as a deploy key.
AZURE_VAULT=<common_vault_name>
KEY_NAME=gitops-ssh-key
# Generate gitops-ssh-key
ssh-keygen -b 4096 -t rsa -f $KEY_NAME
# Save gitops-ssh-key
az keyvault secret set --vault-name $AZURE_VAULT -n "${KEY_NAME}" -f "${KEY_NAME}"
az keyvault secret set --vault-name $AZURE_VAULT -n "${KEY_NAME}-pub" -f "${KEY_NAME}.pub"
# Show Public gitops-ssh-key
az keyvault secret show --vault-name $AZURE_VAULT -n "${KEY_NAME}-pub" --query value -otsv
This will create public and private keys for the Flux repository. We will assign the public key under the following heading: Adding the Repository Key. The private key is stored on the machine originating the deployment.
The Terraform scripts use this node key to setup log-in credentials on the nodes in the AKS cluster. We will use this key when setting up the Terraform deployment variables. Generate the Node Key:
AZURE_VAULT=<common_vault_name>
KEY_NAME=node-ssh-key
# Generate node-ssh-key
ssh-keygen -b 4096 -t rsa -f $KEY_NAME
# Save node-ssh-key
az keyvault secret set --vault-name $AZURE_VAULT -n "${KEY_NAME}" -f "${KEY_NAME}"
az keyvault secret set --vault-name $AZURE_VAULT -n "${KEY_NAME}-pub" -f "${KEY_NAME}.pub"
# Save Locally Public node-ssh-key
az keyvault secret show --vault-name $AZURE_VAULT -n "${KEY_NAME}-pub" --query value -otsv
The public key of the RSA key pair previously created needs to be added as a deploy key. Note: If you do not own the repository, you will have to fork it before proceeding.
Use the contents of the Secret as shown above.
Next, in your Azure DevOPS Project, follow these steps to add your public SSH key to your ADO environment.
Follow these steps if you wish to deploy manually without pipeline support.
Download the required keys from the common Key Vault
AZURE_VAULT=<common_vault_name>
az keyvault secret show --vault-name $AZURE_VAULT -n "node-ssh-key-pub" --query value -otsv > ~/.ssh/node-ssh-key.pub
az keyvault secret show --vault-name $AZURE_VAULT -n "gitops-ssh-key" --query value -otsv > ~/.ssh/gitops-ssh-key
chmod 644 ~/.ssh/node-ssh-key.pub
chmod 600 ~/.ssh/gitops-ssh-key
Update your .env
file with the paths to your public and private SSH keys for Node and GitOPS repo access.
TF_VAR_ssh_public_key_file=/home/$USER/.ssh/node-ssh-key.pub
TF_VAR_gitops_ssh_key_file=/home/$USER/.ssh/gitops-ssh-key
Follow the directions in the common_resources
environment.
Follow the directions in the data_resources
environment.
Follow the directions in the service_resources
environment.
Follow these steps if you wish to setup and deploy using ADO Pipelines.
- Configure an ARM Resources Service Connection
with name
osdu-infrastructure
for the desired subscription.
ADO -> Project Settings -> Service Connection -> New service connection -> Azure Resource Manager -> Service principal (automatic)
-
Scope should be to the desired Subscription but do not apply scope to a Resource Group
-
Locate the Service Principal created () in Azure Active Directory and elevate the principal capability by adding in 2 API Permissions
- Azure Active Directory Graph - Application.ReadWrite.OwnedBy
- Microsoft Graph - Application.ReadWrite.OwnedBy
These 2 API's require
Grant Admin Consent
Infrastructure assumes bring your own Elastic Search Instance at a version of 6.8.x
and access information must be stored in a Common KeyVault for ADO Libraries to access.
AZURE_VAULT="<your_keyvault>"
az keyvault secret set --vault-name $AZURE_VAULT --name "elastic-endpoint-osdu-r3-env" --value <your_es_endpoint>
az keyvault secret set --vault-name $AZURE_VAULT --name "elastic-username-osdu-r3-env" --value <your_es_username>
az keyvault secret set --vault-name $AZURE_VAULT --name "elastic-password-osdu-r3-env" --value <your_es_password>
# This command will extract all Key Vault Secrets
for i in `az keyvault secret list --vault-name $AZURE_VAULT --query [].id -otsv`
do
echo "export ${i##*/}=\"$(az keyvault secret show --vault-name $AZURE_VAULT --id $i --query value -otsv)\""
done
The Elastic endpoint provided should include
https
and the appropriate port number. Ahttp
endpoint will only work if services are properly configured to bypass thehttps
requirement.
-
Setup and Configure the ADO Library
Infrastructure Pipeline Variables
Variable Value AGENT_POOL Hosted Ubuntu 1604 BUILD_ARTIFACT_NAME osdu-infrastructure SERVICE_CONNECTION_NAME osdu-infrastructure TF_VAR_elasticsearch_secrets_keyvault_name osducommon<your_unique>-kv TF_VAR_elasticsearch_secrets_keyvault_resource_group osdu-common-<your_unique> TF_VAR_remote_state_account osducommon<your_unique> TF_VAR_remote_state_container remote-state-container -
Setup and Configure the ADO Library
Infrastructure Pipeline Variables - env
Variable Value ARM_SUBSCRIPTION_ID <your_subscription_id> TF_VAR_aks_agent_vm_count 3 TF_VAR_common_resources_workspace_name cr-env TF_VAR_cosmosdb_replica_location eastus2 TF_VAR_data_resources_workspace_name dr-env TF_VAR_elasticsearch_version 6.8.12 TF_VAR_gitops_branch master TF_VAR_gitops_ssh_url git@<your_flux_manifest_repo> TF_VAR_resource_group_location centralus
You can specify the desired region locations you wish. Change the Elastic version as required.
-
Setup and Configure the ADO Library
Infrastructure Pipeline Secrets - env
Variable Value elastic-endpoint-osdu-r3-env *********
elastic-username-osdu-r3-env *********
elastic-password-osdu-r3-env *********
This should be linked Secrets from Azure Key Vault
osducommon<your_unique>-kv
- Setup 2 Secure Files
- azure-aks-gitops-ssh-key
- azure-aks-node-ssh-key.pub
These files were created above.
-
Add a Pipeline osdu-infrastructure-r3-common -->
azure-pipeline-common.yml
and execute it. -
Add a Pipeline osdu-infrastructure-r3-data -->
azure-pipeline-data.yml
and execute it. -
Add a Pipeline osdu-infrastructure-r3-services -->
azure-pipeline-services.yml
and execute it. -
Once Infrastructure is deployed grant admin_consent to the Service Principal.
After terraform apply
finishes for the cluster_resources, there is one critical output artifact: the Kubernetes config file for the deployed cluster that is generated and saved in the output directory. The default file is output/bedrock_kube_config. The following steps use this file to interact with the deployed Bedrock AKS cluster.
Alternately context can be set with the following command az aks get-credentials -g <ResourceGroup> -n <ClusterName>
Once the context is properly set, one of the first things we can do is list all namespaces used within the cluster:
$ kubectl get namespaces
NAME STATUS AGE
agic Active 19d
cert-manager Active 5d15h
default Active 19d
flux Active 19d
keda Active 12d
kube-node-lease Active 19d
kube-public Active 19d
kube-system Active 19d
kvsecrets Active 19d
osdu Active 19d
podidentity Active 19d
Or view all the pods running by default within the cluster:
$ kubectl get po --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
agic agic-ingress-azure-7b88b4b69f-p9n5w 1/1 Running 0 19d
cert-manager jetstack-cert-manager-567bb678c7-sjmct 1/1 Running 0 5d15h
cert-manager jetstack-cert-manager-cainjector-695d847cdd-l2rv6 1/1 Running 0 5d15h
cert-manager jetstack-cert-manager-webhook-5b895bb689-dvwvd 1/1 Running 0 5d15h
flux flux-6899458bb8-qghrq 1/1 Running 9 19d
flux flux-memcached-8647794c5f-slsvr 1/1 Running 0 19d
keda keda-operator-5895ff46b9-fh5xn 1/1 Running 0 7d20h
keda keda-operator-metrics-apiserver-6774776dbc-jwg7q 1/1 Running 0 12d
kube-system azure-cni-networkmonitor-745vs 1/1 Running 0 19d
kube-system azure-cni-networkmonitor-9kq6c 1/1 Running 0 19d
kube-system azure-cni-networkmonitor-dt7ch 1/1 Running 0 19d
kube-system azure-ip-masq-agent-6kv6v 1/1 Running 0 19d
kube-system azure-ip-masq-agent-p6zxn 1/1 Running 0 19d
kube-system azure-ip-masq-agent-vw7fr 1/1 Running 0 19d
kube-system azure-npm-f9qz7 1/1 Running 0 14d
kube-system azure-npm-j6qdv 1/1 Running 0 14d
kube-system azure-npm-vkghz 1/1 Running 0 14d
kube-system coredns-869cb84759-69nmv 1/1 Running 0 19d
kube-system coredns-869cb84759-fvpf8 1/1 Running 0 19d
kube-system coredns-autoscaler-5b867494f-wvt6q 1/1 Running 11 19d
kube-system dashboard-metrics-scraper-7dbbb6996d-4v6m4 1/1 Running 0 19d
kube-system kube-proxy-4qmfg 1/1 Running 0 19d
kube-system kube-proxy-b66qd 1/1 Running 0 19d
kube-system kube-proxy-wwts4 1/1 Running 0 19d
kube-system kubernetes-dashboard-5596bdb9f-cx4bw 1/1 Running 8 19d
kube-system metrics-server-6cd7558856-lvqrg 1/1 Running 0 19d
kube-system omsagent-4g9qf 1/1 Running 0 19d
kube-system omsagent-j8v77 1/1 Running 0 19d
kube-system omsagent-rs-764c6f8d8-54fjd 1/1 Running 0 19d
kube-system omsagent-zg2wh 1/1 Running 0 19d
kube-system tunnelfront-7cfc889c77-gh9jv 2/2 Running 1 19d
kvsecrets kvsecrets-csi-secrets-store-provider-azure-ddnwn 1/1 Running 0 19d
kvsecrets kvsecrets-csi-secrets-store-provider-azure-j2m7x 1/1 Running 0 19d
kvsecrets kvsecrets-csi-secrets-store-provider-azure-sszrt 1/1 Running 0 19d
kvsecrets kvsecrets-secrets-store-csi-driver-92n8k 3/3 Running 0 19d
kvsecrets kvsecrets-secrets-store-csi-driver-pnx8x 3/3 Running 0 19d
kvsecrets kvsecrets-secrets-store-csi-driver-vmg48 3/3 Running 0 19d
podidentity aad-pod-identity-mic-57dbd9f4fb-bd4vr 1/1 Running 12 19d
podidentity aad-pod-identity-mic-57dbd9f4fb-zj6lb 1/1 Running 0 19d
podidentity aad-pod-identity-nmi-fmjlk 1/1 Running 0 19d
podidentity aad-pod-identity-nmi-qk2t4 1/1 Running 0 19d
podidentity aad-pod-identity-nmi-vtxmg 1/1 Running 0 19d
Note that there is a namespace flux. As previously mentioned, Flux is managing the deployment of all of the resources into the cluster. Taking a look at the description for the flux pod flux-6899458bb8-qghrq, we see the following:
$ kubectl describe po/flux-6899458bb8-qghrq --namespace=flux
Name: flux-6899458bb8-qghrq
Namespace: flux
Priority: 0
Node: aks-default-27739138-vmss000001/10.10.2.35
Start Time: Thu, 06 Aug 2020 12:31:42 -0500
Labels: app=flux
pod-template-hash=6899458bb8
release=flux
Annotations: <none>
Status: Running
IP: 10.10.2.49
IPs:
IP: 10.10.2.49
Controlled By: ReplicaSet/flux-6899458bb8
Containers:
flux:
Container ID: docker://f812201502ed64166239537ba34aad9c0dad1f338eefd7a93f83aee02b35d177
Image: docker.io/fluxcd/flux:1.18.0
Image ID: docker-pullable://fluxcd/flux@sha256:8fcf24dccd7774b87a33d87e42fa0d9233b5c11481c8414fe93a8bdc870b4f5b
Port: 3030/TCP
Host Port: 0/TCP
Args:
--log-format=fmt
--ssh-keygen-dir=/var/fluxd/keygen
--k8s-secret-name=flux-ssh
--memcached-hostname=flux-memcached
--sync-state=git
--memcached-service=
[email protected]:v3/osdu-demo/OSDU_Rx/k8-gitops-manifests
--git-branch=dascholl
--git-path=providers/azure/hld-registry
--git-readonly=false
--git-user=Weave Flux
[email protected]
--git-verify-signatures=false
--git-set-author=false
--git-poll-interval=10s
--git-timeout=20s
--sync-interval=10s
--git-ci-skip=false
--git-label=flux-sync
--automation-interval=5m
--registry-rps=200
--registry-burst=125
--registry-trace=false
--sync-garbage-collection=true
State: Running
Started: Sat, 22 Aug 2020 20:39:16 -0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 20 Aug 2020 22:40:42 -0500
Finished: Sat, 22 Aug 2020 20:39:16 -0500
Ready: True
Restart Count: 9
Requests:
cpu: 50m
memory: 64Mi
Liveness: http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
Environment:
KUBECONFIG: /root/.kubectl/config
Mounts:
/etc/fluxd/ssh from git-key (ro)
/etc/kubernetes/azure.json from acr-credentials (ro)
/root/.kubectl from kubedir (rw)
/var/fluxd/keygen from git-keygen (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flux-token-9vg9f (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kubedir:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: flux-kube-config
Optional: false
git-key:
Type: Secret (a volume populated by a Secret)
SecretName: flux-ssh
Optional: false
git-keygen:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
acr-credentials:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/azure.json
HostPathType:
flux-token-9vg9f:
Type: Secret (a volume populated by a Secret)
SecretName: flux-token-9vg9f
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Flux automation makes it easy to upgrade services or infrastructure. In this example Flux watches the repo we set up previously under the heading Set Up Flux Manifest Repository. Now we add a simple Web application to the running deployment by pushing a .yaml manifest to the repo. The .yaml specification describes the default-service and a Deployment. It specifies the source the Docker image that contains it: image: neilpeterson/aks-helloworld:v1 and how many containers to run: replicas: 1.
When the .yaml file is complete we will push it to the repo, or simply drop it on GitHub. Flux is querying the repo for changes and will deploy the new service replicas as defined by this manifest.
Create the following .yaml file and name it default-service.
apiVersion: v1
kind: Service
metadata:
name: default-service
namespace: osdu
labels:
app: default-service
spec:
type: ClusterIP
ports:
- port: 80
selector:
app: default-service
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: default-service
namespace: osdu
spec:
selector:
matchLabels:
app: default-service
replicas: 1
template:
metadata:
labels:
app: default-service
spec:
containers:
- name: default-service
image: neilpeterson/aks-helloworld:v1
ports:
- containerPort: 80
env:
- name: TITLE
value: "Azure OSDU Platform - (AKS)"
To see the changes as Flux picks them up and deploys them, you can monitor the logs from the flux container.
$ kubectl logs -f flux-6899458bb8-qghrq --namespace=flux
Now, push or drop the default-service.yaml file to the empty repo created under the previous heading Set Up Flux Manifest Repository. You can click Upload files
on the GitHub repo page and drop the .yaml file:
Flux has connected to the repo and created the new service and deployment: "kubectl apply -f -" took=1.591333771s err=null output="service/default-service created\ndeployment.apps/default-service created"
.
Now you can view the service deployed in the osdu
namespace.
$ kubectl get services --namespace=osdu
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default-service ClusterIP 10.0.212.75 <none> 80/TCP 117s
$ kubectl get deployments --namespace=osdu
NAME READY UP-TO-DATE AVAILABLE AGE
default-service 1/1 1 1 2m11s
$ kubectl get pods --namespace=osdu
NAME READY STATUS RESTARTS AGE
default-service-86cd47b748-sc7bv 1/1 Running 0 2m19s
Finally we can connect directly to the pod and validate the service is properly running.
$ kubectl port-forward default-service-86cd47b748-sc7bv 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
View the service in a browser http://localhost:8080
To delete the service take the default-service.yaml and remove it from the manfiest repo and flux will perform the uninstall.
Perform the following procedure from the devops
directory.
- Download helm-config.yaml, which will configure OSDU on Azure.
wget https://raw.githubusercontent.com/Azure/osdu-infrastructure/master/devops/helm-config.yaml -O config.yaml
-
Edit the newly downloaded helm-config.yaml and fill out the sections
azure
andingress
. -
Install the osdu on azure charts.
# Full Install
helm install -f config.yaml osdu-current osdu-azure
- Uninstall the osdu on azure charts.
helm uninstall osdu-current
Kubernetes includes a web dashboard that can be used for basic management operations. This dashboard lets you view basic health status and metrics for your applications, create and deploy services, and edit existing applications.
$ kubectl create clusterrolebinding kubernetes-dashboard --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard
$ az aks browse --resource-group myResourceGroup --name myAKSCluster
The behavior of this command has been altered by the following extension: aks-preview
Merged "devint-aks-mgf9wjxt-osdu-r2-aks" as current context in /tmp/tmps6_a6amm
Proxy running on http://127.0.0.1:8001/
Press CTRL+C to close the tunnel...
Please confirm that you've completed the terraform apply
step before running the integration tests as we're validating the active terraform workspace.
Integration tests can be run using the following command:
go test -v $(go list ./... | grep "integration")
Optionally execute the following command to teardown your deployment and delete your resources.
# Destroy resources and tear down deployment. Only do this if you want to destroy your deployment.
terraform destroy
Copyright © Microsoft Corporation
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.