Neuron 2.21 beta introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and u-trn2 UltraServer. The release showcases Llama 3.1 405B model inference using NxD Inference on a single trn2.48xlarge instance, and FUJI 70B model training using the AXLearn library across eight trn2.48xlarge instances.
NxD Inference, a new PyTorch-based library for deploying large language models and multi-modality models, is introduced in this release. It integrates with vLLM and enables PyTorch model onboarding with minimal code changes. The release also adds support for AXLearn training for JAX models.
The new Neuron Profiler 2.0 introduced in this release offers system and device-level profiling, timeline annotations, and container integration. The profiler supports distributed workloads and provides trace export capabilities for Perfetto visualization.
The documentation has been updated to include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and u-trn2 UltraServer.
Note:
This release (Neuron 2.21 Beta) was only tested with Trn2 instances. The next release (Neuron 2.21) will support all instances (Inf1, Inf2, Trn1, and Trn2).
For access to this release (Neuron 2.21 Beta) contact your account manager.