Skip to content

Latest commit

 

History

History
35 lines (20 loc) · 1.91 KB

README.md

File metadata and controls

35 lines (20 loc) · 1.91 KB

AWS Neuron EKS Samples

This repository contains samples for Amazon Elastic Kubernetes Service (EKS) and AWS Neuron, the software development kit (SDK) that enables machine learning (ML) inference and training workloads on the AWS ML accelerator chips Inferentia and Trainium.

The samples in this repository demonstrate the types of patterns that can be used to deliver inference and distributed training on EKS using Inferentia and Trainium. The samples can be used as-is, or easily modified to support additional models and use cases.

Samples are organized by use case below:

Training

Link Description Instance Type
BERT pretraining End-end workflow for creating an EKS cluster with 2 trn1.32xl nodes and running BERT phase1 pretraining (64-worker DataParallel) Trn1
MLP training Introductory workflow for creating an EKS cluster with 1 node and running a simple MLP training job Trn1
Llama 3.1 8B finetuning with Ray+PTL End-end workflow for creating a Ray cluster with 2 trn1.32xlarge nodes on EKS and running Llama 3.1 8B finetuning Trn1

Inference

Link Description Instance Type
SD inference SD Inference workflow for creating an inference endpoint forwarded by ALB LoadBalancer powered by Karpenter's NodePool Inf2

Getting Help

If you encounter issues with any of the samples in this repository, please open an issue via the GitHub Issues feature.

Contributing

Please refer to the CONTRIBUTING document for details on contributing additional samples to this repository.

Release Notes

Please refer to the Change Log.