This repository contains a sample appliance that also serves as documentation on how to build an appliance for use with Azimuth Cluster-as-a-Service (CaaS).
The appliance uses Terraform to provision resources (e.g. servers, security groups, ports, volumes) and Ansible to configure them, taking advantage of Ansible's support for dynamic inventory to bridge the two components.
- Sample appliance architecture
- Azimuth CaaS Operator
- Ansible variables
- Cluster metadata
- Resource provisioning
- Persistent state
- Cluster patching
- Cluster outputs
This sample appliance is a simple HTTP service, consisting of one or more backends with a load-balancer in front.
The load-balancer is an Nginx server that is configured to forward traffic to the backends. The backends are also Nginx servers, and are configured to render a single page that contains the IP of the host on the internal network. Hence when hitting the service multiple times, the end user should see a different IP address reported each time (when more than one backend is configured!).
The load-balancer is exposed using a floating IP which is reported to the user using the cluster outputs.
Azimuth CaaS appliances are driven using the Azimuth CaaS Operator. In practice, this makes very little difference other than some small constraints on the layout of your repository:
- Roles should be in a
roles
directory at the top level of the repository, unless specified otherwise usingroles_path
in a customansible.cfg
. - If a
requirements.yml
file defining roles and collections is required it must be either in the top level of the repository or in theroles
directory. - If a custom
ansible.cfg
is required, it should be at the top level of the repository.
Variables that vary from site-to-site but are fixed for all deployments of the
appliance at a particular site can be set in the extra_vars
of the ClusterTemplate
resource. For
example, when deployed in Azimuth, this appliance would require cluster_image
to be set to
the ID of an Ubuntu 20.04 image on the target cloud.
Azimuth will deal with the mechanics of setting up the required resources using the
azimuth_caas_cluster_templates_overrides
of the
azimuth-ops Ansible collection.
For example, to use this appliance in an Azimuth deployment, the following configuration
would be used:
azimuth_caas_cluster_templates_overrides:
sample-appliance:
# The git URL of the appliance
gitUrl: https://github.com/azimuth-cloud/azimuth-sample-appliance.git
# The branch, tag or commit id to use
gitVersion: main
# The name of the playbook to use
playbook: sample-appliance.yml
# The URL of the metadata file
uiMetaUrl: https://raw.githubusercontent.com/azimuth-cloud/azimuth-sample-appliance/main/ui-meta/sample-appliance.yml
# Dict of extra variables for the appliance
extraVars:
cluster_image: "<ID of an Ubuntu 20.04 image>"
If you are using community_images_image_ids
to manage your images as part of the Azimuth deployment,
you can easily use the ID of one of the uploaded images:
cluster_image: "{{ community_images_image_ids.ubuntu_2004_20220411 }}"
When invoking an appliance, Azimuth passes a number of Ansible variables. These fall into the following groups:
- System variables: Variables derived by Azimuth providing information about the environment in which the appliance is being deployed.
- User-provided variables: Variables provided by the user using the form in the Azimuth user interface. These are controlled by the cluster metadata file.
The following system variables are provided by Azimuth:
Variable name | Description |
---|---|
cluster_id |
The ID of the cluster. Should be used in the Terraform state key. |
cluster_name |
The name of the cluster as given by the user. |
cluster_type |
The name of the cluster type. |
cluster_user_ssh_public_key |
The SSH public key of the user that deployed the cluster. |
cluster_deploy_ssh_public_key |
A cluster-specific SSH public key generated by the CaaS operator. |
cluster_ssh_private_key_file |
The path to a file containing the private key corresponding to cluster_deploy_ssh_public_key .This is consumed by the azimuth_cloud.terraform.infra role. |
cluster_network |
The name of the project internal network onto which cluster nodes should be placed. |
cluster_floating_network |
The name of the floating network where floating IPs can be allocated. |
cluster_upgrade_system_packages |
This variable is set when a PATCH operation is requested. If given and true , it indicates that system packages should be upgraded. If not given, it should be assumed to be false .The mechanism for acheiving this is appliance-specific, but it is expected to be a disruptive operation (e.g. rebuilding nodes). If not given or set to false , disruptive operations should be avoided where possible. |
cluster_state |
This variable is set when a DELETE operation is requested. If given and set to absent all cluster resources should be deleted, otherwise cluster resources should be updated as normal. |
Each CaaS appliance has a playbook (which may call other playbooks, roles, etc.) and a corresponding cluster metadata file. The cluster metadata file provides information about the cluster such as the human-readable name, logo URL and description. It also defines the variables that should be collected from the user, including how they should be validated and rendered in the form that the user sees in the Azimuth UI.
The cluster metadata file for this sample appliance is at ui-meta/sample-appliance.yml. It is heavily documented to describe the available options.
It is recommended that resources for an appliance are provisioned using Terraform and then adopted into the in-memory inventory using the add_host module.
This appliance uses Terraform to provision security groups, servers and the floating IP and corresponding association. However it is possible to provision any of the resources supported by the Terraform OpenStack provider.
It is possible to implement this process yourself, however Azimuth provides a role - azimuth_cloud.terraform.infra - to simplify the process. This can be used as long as your Terraform outputs conform to a particular specification - an example of how to use it can be seen in this appliance. For more details, take a look at the role documentation and defaults.
If your appliance requires persistent state, for example a database or Ansible local facts, it is recommended that it is placed onto a volume whose lifecycle is tied to the cluster rather than any individual machine. This is to ensure that the state is preserved even if the individual machines are replaced, e.g. during a patch operation.
Creating and attaching the relevant volumes can be done in the Terraform for your appliance.
One of the operations available to users in the Azimuth UI is a "patch". The idea of this operation is that it gives the user control over when packages are updated on their cluster as this is a potentially disruptive operation.
Patching can be done in two ways:
- In-place using the OS package manager (e.g.
yum update -y
) - By replacing the machines with new ones based on an updated image
The second option is preferred, and is implemented by this sample appliance in the cluster_infra role. This option is preferred because it ensures more consistency - images can be built and tested in advance before being rolled out to production. It also allows the use of "fat" images where the majority of the required packages are built into the image - this can speed up cluster provisioning significantly.
When a patch is requested, the image specified in the cluster_image
variable is used - this will
force machines to be re-created if the referenced image has been updated. For all other updates,
the image from the previous execution is used. This ensures that all machines in a cluster have
a consistent image, even if they are created at different times (e.g. scaling the number of workers).
When a cluster playbook executes successfully the last task is able to return outputs that can
be presented in the Azimuth UI, in particular using the usage_template
from the cluster metadata.
To do this, just use a debug
task with the variable outputs
set to a dictionary of outputs.
For example, this appliance uses the cluster outputs to return the allocated floating IP.