This repository contains samples for Amazon Elastic Kubernetes Service (EKS) and AWS Neuron, the software development kit (SDK) that enables machine learning (ML) inference and training workloads on the AWS ML accelerator chips Inferentia and Trainium.
The samples in this repository demonstrate the types of patterns that can be used to deliver inference and distributed training on EKS using Inferentia and Trainium. The samples can be used as-is, or easily modified to support additional models and use cases.
Samples are organized by use case below:
Link | Description | Instance Type |
---|---|---|
BERT pretraining | End-end workflow for creating an EKS cluster with 2 trn1.32xl nodes and running BERT phase1 pretraining (64-worker DataParallel) | Trn1 |
MLP training | Introductory workflow for creating an EKS cluster with 1 node and running a simple MLP training job | Trn1 |
Llama 3.1 8B finetuning with Ray+PTL | End-end workflow for creating a Ray cluster with 2 trn1.32xlarge nodes on EKS and running Llama 3.1 8B finetuning | Trn1 |
Link | Description | Instance Type |
---|---|---|
SD inference | SD Inference workflow for creating an inference endpoint forwarded by ALB LoadBalancer powered by Karpenter's NodePool | Inf2 |
If you encounter issues with any of the samples in this repository, please open an issue via the GitHub Issues feature.
Please refer to the CONTRIBUTING document for details on contributing additional samples to this repository.
Please refer to the Change Log.