Releases: aws/aws-ofi-nccl
AWS OFI NCCL v1.9.0
This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release requires Libfabric v1.18.0 or later and supports NCCL 2.21.5-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).
New Features:
- Support v8 plugin interface introduced with NCCL 2.20. This enables the use of the user memory registration feature recently introduced in NCCL.
- Update the tuner component to support v2 ext-tuner interface introduced with NCCL 2.21.
- Reduce ordering constraints for control messages, to reduce head of line blocking under congestion.
Bug Fixes:
- Increase the number of communicators to 256K (from 4K), supporting larger all-to-all groups.
- Improve logging in some corner case error conditions.
The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:
- efa
Checksum (sha512) for the release tarball:
7c86650f2f275b97bd08ff66b24ae8fef593269c068ec543259903d0eec80a0fe4153a3f171700e7e3dcb3b809a1d6aba82d5e7dc52ec138eacd7353629d1bc0 aws-ofi-nccl-1.9.0-aws.tar.gz
AWS OFI NCCL v1.8.1
This is a bugfix release that requires Libfabric v1.18.0 or later and supports NCCL v2.19.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).
Bug Fixes:
- Fix an issue with the ID pool's reference counting and allocation
- Improved error propagation for failed NCCL requests, allowing applications to fail early instead of blocking on requests that can never be completed.
The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:
- efa
Checksum (sha512) for the release tarball:
4ee21380176d5a76e4af0233ac44d1d46f92fd34941ecfaa104b7567a16cc84503c0abe59e540d36d79675bb3cc443979ed319f39582e301814d0653ea184508 aws-ofi-nccl-1.8.1-aws.tar.gz
AWS OFI NCCL v1.8.0
This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release requires Libfabric v1.18.0 or later and supports NCCL v2.19.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.4.8 and later).
New Features:
- A tuner component for the plugin that picks the optimal NCCL algorithm and protocol at a given scale and message size.
- Improved communicator and memory region identifier management.
- Migrated from CUDA Runtime API to functional equivalents in CUDA Driver API in preparation for dma-buf support for memory registration. With this change, the plugin uses the same mechanism as NCCL to interact with the CUDA subsystem.
- No longer forcing a flush operation for network operations when running with H100 GPUs, even when running with older NCCL versions (< v2.19.1).
- Improvements to internal device-agnostic APIs.
- Support for NCCL v7 ext-net plugin interface introduced in NCCL v2.19.3.
- Support for Ubuntu 22.04 LTS distribution.
Bug Fixes:
- Set the maximum NVLS tree chunk size used to 512KiB to recover from a performance regression introduced in NCCL v2.19.4, using a parameter introduced in NCCL v2.20.3.
- Prevent possible invocation of CUDA calls in libfabric by requiring a libfabric version of v1.18.0 or newer.
- Fix debug prints that reported incorrect device IDs during initialization
- Fixes to MAX_COMM computation.
- Better handling of NVLS enablement when NCCL is statically linked to applications
- Fixes to internal API return codes
- Configuration system fixes for Neuron builds
- Fixes to plugin environment parsing to be case insensitive
- Miscellaneous fixes that address memory leaks, NULL derefences, and compiler warnings.
- Updates and improvements to the project documentation.
Testing:
This release has been tested extensively with NCCL v2.19.4-1 for functionality and performance. This release has also been lightly tested with NCCL v2.20.3-1 that was released earlier this week. It was tested with Libfabric versions up to Libfabric v1.19.0.
Checksum (sha512) for the release tarball:
7bad7995e99649dc3ae4c46b2b0011225134703050ae83ab837cd46a7ff979079809cbd117e50cf5169428dd397ab099fea6249d12f891bff94b2d5579b0c0d9 aws-ofi-nccl-1.8.0-aws.tar.gz
AWS OFI NCCL v1.7.4
This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release includes the following changes:
New Features:
- Hard fail if GPUDirect RDMA initialization fails on an EC2 instance that should support GPUDirect RDMA (such as P4d.24xlarge or P5.48xlarge), rather than fall back to host copy buffers at significantly reduced performance. Setting the environment variable
OFI_NCCL_DISABLE_GDR_REQUIRED_CHECK=1
will disable this behavior. - Change the threshold at which the rdma transport switches from round robin to striping from 8 KiB to 256 KiB, improving the efficiency of large message transfers.
Bug Fixes:
- Fixed debugging output in some initialization failure cases.
- Request
FI_LOCAL_COMM
feature from Libfabric, as flush and eager copies are both implemented via local communication. - Fix initialization when using the Libfabric TCP provider.
- Improve documentation on using the plugin with AWS's Elastic Fabric Adapter (EFA).
- Improve handling of Neuron device detection when the plugin is used with Tranium instances.
- Fix segfault in error case of freelist memory growth.
- The test programs that only support 2 ranks now fail with a useful error message if run with another number of ranks.
This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.
AWS OFI NCCL v1.7.3
This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release includes the following changes:
- Do not disable LL and LL128 protocols on P5 instances.
- Add support for g5.48xlarge instance types.
- Fix a block in use leak in the freelist implementation.
- For NCCL 2.18.5 or later, don't disable NVLS support.
- Fix bug in handling retry error issues from Libfabric in the RDMA transport (P5 instance types).
This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.
AWS OFI NCCL v1.7.2
This release is intended only for use on AWS P* instances. A general release that supports other Libfabric networks will be made in the near future. This release includes the following changes:
- Fix compilation against CUDA versions prior to 11.3.
- Fix allocation of free lists to avoid accidently registering user data, which can cause corruption on fork() with older Linux kernels.
- Fix memory leak with registered bounce buffers.
- Fix improper usage of optlen in call to fi_getopt().
- Numerous memory cleanup fixes.
This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.
AWS OFI NCCL v1.7.1
This release is part of enabling AWS's P5 platform. It is not recommended for other platforms at this time; we will release a general 1.7.x series in the near future.
This release removes the direct dependency on libcudart.so and dynamically loads the shared library at runtime, similar to the behaviors of NCCL and Libfabric.
This release has been tested on P3dn, P4d/P4de, and P5 using the EFA provider in Libfabric.
AWS OFI NCCL v1.7.0
This release is part of enabling AWS's P5 instance type. It has no useful features for other platforms.
This release requires Libfabric v1.11.0 or later and supports NCCL v2.17.1-1 while maintaining backward compatibility with older NCCL versions (up to NCCL v2.4.8). It was tested with Libfabric versions up to Libfabric v1.17.1.
The plugin has been tested with following libfabric providers using unit tests bundled in the source code and nccl-tests test suite:
efa
tcp
AWS OFI NCCL v1.7.0rc1-aws
Pre-release of the next 1.7.0 release series, which will (initially) target only the AWS EFA platform.
AWS OFI NCCL v1.6.0
This release requires Libfabric v1.11.0 or later and supports NCCL v2.17.1-1 while maintaining backward compatibility with older NCCL versions (up to NCCL v2.4.8). It was tested with Libfabric versions up to Libfabric v1.17.1.
The plugin has been tested with following libfabric providers using unit tests bundled in the source code and nccl-tests test suite:
- efa
- tcp; ofi_rxm