Skip to content

Terraform module that allows training, deploying and re-training of machine learning models on AWS

License

Notifications You must be signed in to change notification settings

crederauk/terraform-aws-mlops-module

Repository files navigation

AWS-MLOps-module

This repo contains a terraform module with corresponding AWS resources that enable training, deploying and re-training AWS-hosted machine learning models with corresponding cloud infrastructure.

Warning: This repo is a basic template for MLOps resources on AWS. Please apply appropriate security enhancements for your project in production.

High-Level Solution Architecture

image

Example Usage

module "MLOps" {
 source  = "github.com/crederauk/terraform-aws-mlops-module?ref=<MODULE_VERSION>"
 resource_naming_prefix  = "your-app"
 data_s3_bucket          = "your-bucket-name"
 data_location_s3        = "/your_s3_folder/your_data.csv"
 model_target_variable   = "y"
 tuning_metric           = "AUC"
 retrain_model_bool      = true
 retraining_schedule     = "cron(0 8 1 * ? *)"
 algorithm_choice        = "classification"
 sagemaker_training_notebook_instance_type = "ml.m4.xlarge"
 inference_instance_count = 1
 preprocessing_script_path = "terraform/preprocess_data.py"
 tags = {
   my-tag-key = "my-tag-value"
 }
} 

Requirements

Name Version
terraform >= 1.0
archive 2.4.0
aws >= 4.0
local >= 2.4
random >= 3.6

Providers

No providers.

Modules

Name Source Version
ecr ./modules/ecr n/a
retraining_job ./modules/glue n/a
s3 ./modules/s3 n/a
sagemaker ./modules/sagemaker n/a

Resources

No resources.

Inputs

Name Description Type Default Required
algorithm_choice Machine learning problem type e.g classification, regression, clustering, anomaly, time_series string n/a yes
data_location_s3 The path to a file in the data S3 bucket within which training data is located. Should be in the format //. If the file is in the root of the bucket, this should be set to / only. string n/a yes
data_s3_bucket The name of an S3 bucket within which training data is located. string n/a yes
data_s3_bucket_encryption_key_arn The ARN of the KMS key using which training data is encrypted in S3, if such a key exists. string "" no
inference_instance_count The initial number of instances to serve the model endpoint number 1 no
inference_instance_type The instance type to be created for serving the model. Must be a valid EC2 instance type string "ml.t2.medium" no
model_target_variable The dependent variable (or 'label') that the model aims to predict. This should be a column name in the dataset. string n/a yes
preprocessing_script_path The path the user provides if they want to include their own data cleaning logic string null no
resource_naming_prefix Naming prefix to be applied to all resources created by this module string n/a yes
retrain_model_bool Boolean to indicate if the retraining pipeline shoud be added bool false no
retraining_schedule Cron expression for the model retraining frequency in the AWS format. See https://docs.aws.amazon.com/lambda/latest/dg/services-cloudwatchevents-expressions.html for details string "" no
sagemaker_training_notebook_instance_type The Sagemaker notebook instance type to be created for training the model. Must be a valid EC2 instance type string "ml.t2.medium" no
tags Tags applied to your resources map(string) {} no
tuning_metric The metric user want to focus when tuning hyperparameter string n/a yes

Outputs

Name Description
config_bucket Config S3 Bucket Terraform object
ecr The ECR repository module outputs. Contains both 'repository' and 'encryption_key' attributes, that are the ECR repository and KMS encryption key Terraform object respectively.
ecr_repository The ECR repository Terraform object.
glue The Glue module outputs. Contains both 'retraining_job' and 'retraining_role' attributes, that are the Glue retraining job and IAM role Terraform objects respectively.
glue_retraining_role The Glue retraining job IAM role Terraform object.
model_bucket Model S3 Bucket Terraform object
s3_encryption_key S3 encryption KMS key Terraform Object
sagemaker_endpoint_name Sagemaker model endpoint name
sagemaker_model_name Sagemaker model name
sagemaker_notebook_instance Sagemaker notebook instance Terraform object

Destroying Resources

After creating the resources made using this the module, the resources:

  • Sagemaker model
  • Sagemaker Endpoint
  • Endpoint configuration

Will not be tracked by your Terraform state file so if you decide to run "terraform destroy" these resources will not be deleted.

To destroy these resourses we recommend that you add these commands to your CI/CD pipeline:

aws sagemaker delete-model --model-name < demo-regression-model >
aws sagemaker delete-endpoint-config --endpoint-config-name < demo-regression-model-config >
aws sagemaker delete-endpoint --endpoint-name < demo-regression-model >    

But before this you will need to add your AWS credentials to the environment if you have not do already:

aws-access-key-id: < aws-access-key-id >
aws-secret-access-key: < aws-secret-access-key >
aws-region: < region >

About

Terraform module that allows training, deploying and re-training of machine learning models on AWS

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •