This repo contains a terraform module with corresponding AWS resources that enable training, deploying and re-training AWS-hosted machine learning models with corresponding cloud infrastructure.
Warning: This repo is a basic template for MLOps resources on AWS. Please apply appropriate security enhancements for your project in production.
module "MLOps" {
source = "github.com/crederauk/terraform-aws-mlops-module?ref=<MODULE_VERSION>"
resource_naming_prefix = "your-app"
data_s3_bucket = "your-bucket-name"
data_location_s3 = "/your_s3_folder/your_data.csv"
model_target_variable = "y"
tuning_metric = "AUC"
retrain_model_bool = true
retraining_schedule = "cron(0 8 1 * ? *)"
algorithm_choice = "classification"
sagemaker_training_notebook_instance_type = "ml.m4.xlarge"
inference_instance_count = 1
preprocessing_script_path = "terraform/preprocess_data.py"
tags = {
my-tag-key = "my-tag-value"
}
}
Name | Version |
---|---|
terraform | >= 1.0 |
archive | 2.4.0 |
aws | >= 4.0 |
local | >= 2.4 |
random | >= 3.6 |
No providers.
Name | Source | Version |
---|---|---|
ecr | ./modules/ecr | n/a |
retraining_job | ./modules/glue | n/a |
s3 | ./modules/s3 | n/a |
sagemaker | ./modules/sagemaker | n/a |
No resources.
Name | Description | Type | Default | Required |
---|---|---|---|---|
algorithm_choice | Machine learning problem type e.g classification, regression, clustering, anomaly, time_series | string |
n/a | yes |
data_location_s3 | The path to a file in the data S3 bucket within which training data is located. Should be in the format //. If the file is in the root of the bucket, this should be set to / only. | string |
n/a | yes |
data_s3_bucket | The name of an S3 bucket within which training data is located. | string |
n/a | yes |
data_s3_bucket_encryption_key_arn | The ARN of the KMS key using which training data is encrypted in S3, if such a key exists. | string |
"" |
no |
inference_instance_count | The initial number of instances to serve the model endpoint | number |
1 |
no |
inference_instance_type | The instance type to be created for serving the model. Must be a valid EC2 instance type | string |
"ml.t2.medium" |
no |
model_target_variable | The dependent variable (or 'label') that the model aims to predict. This should be a column name in the dataset. | string |
n/a | yes |
preprocessing_script_path | The path the user provides if they want to include their own data cleaning logic | string |
null |
no |
resource_naming_prefix | Naming prefix to be applied to all resources created by this module | string |
n/a | yes |
retrain_model_bool | Boolean to indicate if the retraining pipeline shoud be added | bool |
false |
no |
retraining_schedule | Cron expression for the model retraining frequency in the AWS format. See https://docs.aws.amazon.com/lambda/latest/dg/services-cloudwatchevents-expressions.html for details | string |
"" |
no |
sagemaker_training_notebook_instance_type | The Sagemaker notebook instance type to be created for training the model. Must be a valid EC2 instance type | string |
"ml.t2.medium" |
no |
tags | Tags applied to your resources | map(string) |
{} |
no |
tuning_metric | The metric user want to focus when tuning hyperparameter | string |
n/a | yes |
Name | Description |
---|---|
config_bucket | Config S3 Bucket Terraform object |
ecr | The ECR repository module outputs. Contains both 'repository' and 'encryption_key' attributes, that are the ECR repository and KMS encryption key Terraform object respectively. |
ecr_repository | The ECR repository Terraform object. |
glue | The Glue module outputs. Contains both 'retraining_job' and 'retraining_role' attributes, that are the Glue retraining job and IAM role Terraform objects respectively. |
glue_retraining_role | The Glue retraining job IAM role Terraform object. |
model_bucket | Model S3 Bucket Terraform object |
s3_encryption_key | S3 encryption KMS key Terraform Object |
sagemaker_endpoint_name | Sagemaker model endpoint name |
sagemaker_model_name | Sagemaker model name |
sagemaker_notebook_instance | Sagemaker notebook instance Terraform object |
After creating the resources made using this the module, the resources:
- Sagemaker model
- Sagemaker Endpoint
- Endpoint configuration
Will not be tracked by your Terraform state file so if you decide to run "terraform destroy" these resources will not be deleted.
To destroy these resourses we recommend that you add these commands to your CI/CD pipeline:
aws sagemaker delete-model --model-name < demo-regression-model >
aws sagemaker delete-endpoint-config --endpoint-config-name < demo-regression-model-config >
aws sagemaker delete-endpoint --endpoint-name < demo-regression-model >
But before this you will need to add your AWS credentials to the environment if you have not do already:
aws-access-key-id: < aws-access-key-id >
aws-secret-access-key: < aws-secret-access-key >
aws-region: < region >