Releases: nv-legate/legate
v24.11.01
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.11/eula.pdf.
Linux x86 and ARM conda packages with multi-node support (based on UCX or GASNet) are available at https://anaconda.org/legate/legate (GASNet-based packages are under the gex
label).
Documentation for this release can be found at https://docs.nvidia.com/legate/24.11/.
New features
- Bug fixes for release 24.11.00
v24.11.00
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.11/eula.pdf.
Linux x86 and ARM conda packages with multi-node support (based on UCX or GASNet) are available at https://anaconda.org/legate/legate (GASNet-based packages are under the gex
label).
Documentation for this release can be found at https://docs.nvidia.com/legate/24.11/.
New features
- Provide an MPI wrapper, that the user can compile against their local MPI installation, and integrate with an existing build of Legate. This is useful when a user needs to use an MPI installation different from the one Legate was compiled against.
- Add support for using GASNet as the networking backend, useful on platforms not currently supported by UCX, e.g. Slingshot11. Provide scripts for the user to compile GASNet on their local machine, and integrate with an existing build of Legate.
- Automatic machine configuration; Legate will now detect the available hardware resources at startup, and no longer needs to be provided information such as the amount of memory to allocate.
- Print more information on what data is taking up memory when Legate encounters an out-of-memory error.
- Support scalar parameters, default arguments and reduction privileges in Python tasks.
- Add support for a
concurrent_task_barrier
, useful in preventing NCCL deadlocks. - Allow tasks to specify that CUDA context synchronization at task exit can be skipped, reducing latency.
- Experimental support for distributed hdf5 and zarr I/O.
- Experimental support for single-CPU/GPU fast-path task execution (skipping the tasking runtime dependency analysis).
- Experimental implementation of a "bloated" instance prefetching API, which instructs the runtime to create instances encompassing multiple slices of a store ahead of time, potentially reducing intermediate memory usage.
- full changelog
Known issues
The GPUDirectStorage backend of the hdf5 I/O module (off by default, and enabled with LEGATE_IO_USE_VFD_GDS=1
) is not currently working (enabling it will result in a crash). We are working on a fix.
Legate's auto-configuration heuristics will attempt to split CPU cores and system memory evenly across all instantiated OpenMP processors, not accounting for the actual core count and memory limits of each NUMA domain. In cases where the number of OpenMP groups does not evenly divide the number of NUMA domains, this bug may cause unsatisfiable core and memory allocations, resulting in error messages such as:
not enough cores in NUMA domain 0 (72 < 284)
reservation ('OMP0 proc 1d00000000000005 (worker 8)') cannot be satisfied
insufficient memory in NUMA node 4 (102533955584 > 102005473280 bytes) - skipping allocation
These issues should only affect performance if you are actually running computations on the OpenMP cores (rather than using the GPUs for computation). You can always adjust the automatically derived configuration values through LEGATE_CONFIG
, see https://docs.nvidia.com/legate/latest/usage.html#resource-allocation.
v24.06.01
This is a patch release, and includes the following fixes:
- Fix for #945
- Fix for StanfordLegion/legion#1719
- Fix cuda package dependencies
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.06/eula.pdf. x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/legate-core.
Documentation for this release can be found at https://docs.nvidia.com/legate/24.06/.
v24.06.00
This release re-implements the Legate API in C++, which significantly reduces the overhead of the control code. This release also introduces the following major features:
- As a result of the C++ re-implementation of the API, now the entire Legate program can be written in C++ (previously the control code had to be written in Python).
- The Legate Array API, which extends Legate Stores with support for struct-type and nullable containers, and even containers of variable-length elements (e.g. string containers, and sparse array representations)
- An implementation of STL algorithms based on the Legate API, which allows users to easily express common parallelism patterns without needing to write custom tasks.
- Support for writing leaf tasks in Python (previously only leaf task implementations in C++ were supported)
- Integration with NSight Systems (initial support)
This release bumps the minimum support CUDA version to 12.0.
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.06/eula.pdf. x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/legate-core.
Documentation for this release can be found at https://docs.nvidia.com/legate/24.06/.
v23.11.00
This release focuses on bugfixes and documentation improvements, in particular a formally documented support matrix.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🛠️ Improvements
- Use repository variables as possible. by @mag1cp1n in #839
- Expand ranges when reading thread_siblings_list by @vzhurba01 in #849
- Use testdata to remove duplicate test dictionary by @vzhurba01 in #851
- Add a launcher option to the tester by @marcinz in #825
🐛 Bug Fixes
- Avoid gc infinite loop at runtime destruction time by @manopapad in #842
- Add missing 12.0 CUDA libraries to env generation script by @manopapad in #850
- Set Mypy version downloaded in CI by @Jacobfaib in #859
- Remove numpy from conda build dependencies. by @bdice in #855
- Control ucx presence in install_info more carefully by @bryevdv in #882
📖 Documentation
- Document support matrix by @manopapad in #852
- API reference for resource scoping by @magnatelee in #857
- Suggest using mamba over conda by @manopapad in #881
New Contributors
- @mag1cp1n made their first contribution in #839
- @vzhurba01 made their first contribution in #849
- @bdice made their first contribution in #855
- @trivialfis made their first contribution in #861
Full Changelog: v23.09.00...v23.11.00
v23.09.00
This release includes a number of bug fixes for multi-process execution, and quality-of-life improvements to the build system and driver script.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🛠️ Improvements
- Add support for custom wrappers by @bryevdv in #813
- Make NCCL warm-up optional by @magnatelee in #815
- Enable symbols on REALM_BACKTRACE through libdw by @manopapad in #742
- Clean up reduction store init using the new future map reduction API by @magnatelee in #821
- Use Legion with CMake's native CUDA language by @trxcllnt in #828
- Auto-detect multi-node based on env vars by @bryevdv in #832
📖 Documentation
🐛 Bug Fixes
- Pre-seed random number generators deterministically, to guard against control replication violations by @ipdemes in #809
- Enable shard-local future creation for IO by @ipdemes in #835
- Respect user-supplied PYTHONPATH by @bryevdv in #836
- Use unordered detach operations by @ipdemes in #823
- Fix oversubscription support in sharding functors by @ipdemes in #819
- Respect the type of passed storage in create_store by @manopapad in #834
New Contributors
- @ajschmidt8 made their first contribution in #826
Full Changelog: v23.07.00...v23.09.00
v23.07.00
This release introduces support for resource scoping annotations, which allow parts of the program to be assigned to a subset of the available processors/GPUs. This release also includes some more examples of writing legate libraries, improved logging and safety checks, and a refactoring of legate.core's internals.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🚀 New Features
- Add per-library loggers at the python level by @manopapad in #639
- Resource scoping by @magnatelee in #457
🛠️ Improvements
- Add support for Python 3.11 (#608) by @marcinz in #615
- Rename variables and functions to make them clearer by @magnatelee in #627
- Use subsumption checks for instance policies by @magnatelee in #626
- Add provenance information to nvtx ranges by @manopapad in #654
- Use parent frame indiscriminately in nested provenance by @manopapad in #666
- Safe vector accesses in examples by @magnatelee in #681
- Task variant registry by @magnatelee in #675
- Mapper refactoring by @magnatelee in #676
- adding flag for valgrind by @ipdemes in #686
- Core type system by @magnatelee in #697
- Revise CMAKE helper functions to support custom Python paths. by @csadorf in #702
- Error-out if multi-rank run is started on build w/o networking by @manopapad in #734
- Add specialized constructors and safety checks to
legate::Scalar
by @magnatelee in #736 - Stop tracking callbacks by @magnatelee in #748
- Add --ranks-per-node option to tester by @bryevdv in #749
- Add support for test timeouts by @bryevdv in #756
- Add support for --gasnet-system by @bryevdv in #758
- Mapper unification by @magnatelee in #763
- Add simple --last-failed option by @bryevdv in #762
- Opt-out validation for C++ accessor types and dimension by @RAMitchell in #745
- More error checking for stores by @magnatelee in #784
- Use stable UIDs for common fixed-size array types by @magnatelee in #785
📖 Documentation
- Update info on using standard python interpreter by @manopapad in #628
- Disambiguate some flags in BUILD.md by @manopapad in #641
- Guard against attaching to non-contiguous buffers by @manopapad in #653
- Fix documentation issues by @marcinz in #655
- Note new minimum CUDA requirements for conda packages by @manopapad in #673
- Document read-only / env-only settings by @bryevdv in #684
- Document a case where the communicators list may be empty by @manopapad in #708
- Reduction example by @magnatelee in #660
- IO example by @magnatelee in #633
🐛 Bug Fixes
- Tutorial editable install fix by @jjwilke in #610
- Make lgpatch UX consistent with driver by @bryevdv in #617
- More robust nsys --sample flag with --nsys-extra by @jjwilke in #618
- Fix example build tests by @jjwilke in #646
- Don't use traceback.walk_stack(None) by @manopapad in #661
- Skip provenance from NVTX range if empty by @manopapad in #657
- Make
legate::is_floating_point
hold for float16 by @magnatelee in #692 - Fix the mapping of Futures in the BaseMapper by @manopapad in #671
- Add a missing include to cmake for legate helper functions by @marcinz in #693
- Fix CMake template directories to use current_dir for subfolders by @jjwilke in #688
- Not all task.futures are backing Stores by @manopapad in #700
- Fix off-by-one errors in resource scoping code by @manopapad in #714
- Fix a "file-not-found" bug during repeated editable installs by @manopapad in #716
- Minor fix for type construction in Scalar by @magnatelee in #719
- Make
tree_reduce
reuse the existing partition by @magnatelee in #699 - Fix bugs in corner cases of
tree_reduce
by @magnatelee in #731 - Make sure local fields are not enabled for any Python interpreter by @magnatelee in #730
- Fixes for resource scoping by @magnatelee in #726
- Don't automatically close dlopen'ed .so's of Legate libs by @manopapad in #733
- Fix error w/ disable mpi setting by @bryevdv in #743
- Fix the broken unit test for machine objects by @magnatelee in #747
- site.getsitepackages() returns a list of paths, not a path by @ericniebler in #767
- avoid undefined behavior in
Span::end
by @ericniebler in #772 - Set lib_dir explicitly to lib/, even on RHEL by @manopapad in #766
- Collective fix by @ipdemes in #687
- Constrain OpenBLAS version, to work around legion#1500 by @manopapad in #782
- avoid using nvtx domain separator @ in nvtx ranges by @jjwilke in #790
- Pin host compilers to 11.* during environment generation by @m3vaz in #791
New Contributors
- @csadorf made their first contribution in #702
- @ericniebler made their first contribution in #767
- @RAMitchell made their first contribution in #745
Full Changelog: v23.03.00...v23.07.00
v23.03.00
This is the beta release of Legate Core.
This release focuses on making it easier for developers to get started building libraries on top of Legate Core, including features like updated API documentation, helper CMake functions for bootstrapping new Legate library projects, and a new "Hello World" library example, that demos the use of fundamental Legate API calls.
This release also adds support for using the standard python interpreter for running Legate programs (in addition to using the custom legate
driver script).
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🐛 Bug Fixes
- Mappers should skip collective views with no suitable instance by @magnatelee in #559
- don't use sys.argv for plain python init by @bryevdv in #569
- Add
--numamem
to the tester by @magnatelee in #576 - Add nvml dependency in the conda build script to get the headers for realm by @m3vaz in #586
- Fix is_complete_for check by @manopapad in #587
- Fixes for running cuNumeric CI multi-node by @manopapad in #597
- Fix ucx:tls_host default value by @SeyedMir in #592
- Fix a bug in the new registration callback API by @magnatelee in #603
🚀 New Features
- Default python interpreter support for Legate by @eddy16112 in #539
- Build helper functions for legate projects, legate-hello example by @jjwilke in #571
🛠️ Improvements
- Update the architectures built in conda package by @marcinz in #545
- NVTX: Use RangePush and Domain by @evanramos-nvidia in #293
- Refactoring changes by @magnatelee in #581
- Fix C++ warnings, virtual destructor bugs, and style issues by @jjwilke in #591
- Add CTK stubs dir to implicit link directories by @trxcllnt in #599
- Pin Legion to specific commit sha by default by @trxcllnt in #593
- Add support for Python 3.11 by @m3vaz in #608
📖 Documentation
- Update Build.md to add the missing dependency, rust by @natsukium in #565
- Document DeferredBuffer.destroy() lifetime issues in CUDA tasks by @manopapad in #566
- API reference by @magnatelee in #563
- More informative OOM message by @manopapad in #604
New Contributors
- @evanramos-nvidia made their first contribution in #293
- @natsukium made their first contribution in #565
Full Changelog: v23.01.00...v23.03.00
v23.01.00
This release adds initial support for using the UCX Realm networking backend (for more efficient multi-node communication) and using Legion's new "collective views" feature (for improved scheduling of reduction operations). Both of these features are currently in preview mode, and not enabled by default. They are planned to become the default by next release, following further verification and tuning.
This release improves the build experience for developers, with fixes to corner cases in the cmake configuration, a rewrite of the build documentation, and a script for generating complete conda environments for development, covering all supported platforms.
This release also introduces improvements in user interface (improved jupyter support, more CLI options for debugging and profiling), memory usage (through better instance management in the mapper) and the Legate programming model (allowing libraries to add custom profiler annotations, and use arbitrary communicator libraries in their tasks).
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🐛 Bug Fixes
- Legion bug WAR: don't instantiate futures on framebuffer by @manopapad in #413
- Handle conflicts for library-level args by @bryevdv in #416
- Fix Transform class hierarchy by @manopapad in #427
- Handle scalar outputs correctly in manual tasks by @magnatelee in #432
- Explicitly build Legion if
legion_dir
orlegion_src_dir
is not provided by @trxcllnt in #411 - Fix GPU shard computation by @bryevdv in #433
- Only set default CMake generator if Ninja is available: Issue #374 by @jjwilke in #379
- Fix an issue with editable installs by @bryevdv in #434
- Allow only one of --legion-dir and --legion-src-dir by @jjwilke in #387
- legate/util: fix a mypy error on MacOS by @rohany in #438
- Improvements to legate.jupyter by @bryevdv in #425
- Fix for cunumeric#668 by @manopapad in #453
- Only keep traceback reprs, to avoid cycles by @bryevdv in #447
- Fix returned legion paths for editable install with separate legion b… by @jjwilke in #442
- Make
install.py
reconfigure editable installs when build type changes by @trxcllnt in #455 - fix for -ll:networks none, we will init MPI if it has not been initialized by @eddy16112 in #465
- Construct region-backed 0D stores in a correct way by @magnatelee in #450
- Pass a sufficiently high default value for gasnet's ibv-max-hcas by @manopapad in #477
- Make overlap check tight by @manopapad in #479
- Conda env script fixes by @manopapad in #481
- Fix some typos by @manopapad in #485
- fix several reference cycle / leak related bugs by @rohany in #488
- legate/core: fix FutureMap leak in communicator shutdown by @rohany in #495
- src/core/mapping: adjust indirect copy mapping for GPUs by @rohany in #499
- Don't access stream pools unless we're on GPUs by @magnatelee in #503
- Update env gen script so OS type works for mac by @m3vaz in #523
- Don't check for collective behavior when we have WRITE privilege by @manopapad in #526
- All NCCL ranks on the same node must get the same NCCL_IB_HCA by @manopapad in #528
- legate/core/_legion: add default new argument to dep part functions by @rohany in #527
- Don't turn on Legate debug checks on debug-rel builds by @manopapad in #533
- src/core: guard against missing projection functors in collective check by @rohany in #534
- Erase cached reduction instances that cannot be acquired by @magnatelee in #536
🚀 New Features
- Support for library specific annotations by @magnatelee in #464
- Cycle detection check by @manopapad in #361
- Implementing logic for reuse of reduction instances by @ipdemes in #511
- Use collective views when mapping by @ipdemes in #466
🛠️ Improvements
- Driver verbose only for rank 0 or "none" launcher by @bryevdv in #403
- Consolidate driver and test driver codebases by @bryevdv in #397
- On mapping failure retry after tightening non-RO reqs by @manopapad in #423
- More changes for provenance by @magnatelee in #417
- Allow launcher_extra to split quoted values by @bryevdv in #444
- Add script to generate conda envs by @bryevdv in #367
- Mapper improvements by @magnatelee in #452
- Support for concurrent launches by @magnatelee in #459
- legate/core/types: add missing
to_pandas_type
on Complex types by @rohany in #467 - Add --cprofile driver option by @bryevdv in #475
- Optimize scalar extraction by @magnatelee in #472
- Refactor CPU collective communicator by @eddy16112 in #468
- Refactoring changes by @magnatelee in #478
- Regenerate
install_info.py
on every build by @trxcllnt in #486 - Update create_buffer to use socket memories whenever available by @magnatelee in #487
- Check for cycles involving Futures after runtime shutdown by @manopapad in #496
- Fix for 509 by @magnatelee in #510
- Improve build documentation by @manopapad in #517
- Pass
CMAKE_GENERATOR
to scikit-build by @trxcllnt in #529 - Change the default CPU architecture to haswell. by @marcinz in #538
- Build rust
legion_prof
by @trxcllnt in #535 - adding logic for collective instances to the legate_select_sources by @ipdemes in #532
- Add support for building Legion with the UCX backend by @SeyedMir in #516
New Contributors
Full Changelog: v22.10.00...v23.01.00
v22.10.00
Release 22.10 contains several improvements to memory management. Those changes are to recycle memory space from GC'ed Legate stores more eagerly for fresh ones. Another big change in this release is a new build infrastructure based on CMake and scikit-build for the Legate ecosystem, which is a big leap over the previous ad-hoc build system. The release also includes two useful debugging features: 1) provenance tracking for tasks and other operator kinds issued by client libraries and 2) detailed logging for client library mappers.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🐛 Bug Fixes
- Fix target_memory setting for task futures by @manopapad in #327
- avoiding division by 0 in slicing by @ipdemes in #319
- Use the correct key when adding types to a library TypeSystem by @manopapad in #333
- fix partitioning of empty regions by @ipdemes in #337
- Correctly inline map 0d stores by @magnatelee in #340
- Fix error and warning message when --launcher is missing by @manopapad in #341
- make sure communicators are destroyed one by one by @eddy16112 in #345
- Specify an upper bound for return value sizes for
core::extract_scalar
by @magnatelee in #346 - Set an upper bound for allocations for the serdez type used in the core by @magnatelee in #348
- fix cpu communicator for omp by @eddy16112 in #352
- Use Legion primitives to coordinate accesses to the shared instance manager by @magnatelee in #355
- Synchronize instance manager accesses from mapper calls other than
map_task
by @magnatelee in #358 - Follow-on changes to #353 by @magnatelee in #360
- Scalar store fix by @magnatelee in #365
- InstanceManager segfault fixes by @manopapad in #368
- Fix typos in solver.py by @magnatelee in #366
- legate_core_cpp.cmake: add missing barrier header file in export by @rohany in #389
- Use Python GC to release Legion handles from destroyed RegionManagers by @magnatelee in #391
- legate/driver: fix driver legion_module path by @rohany in #394
- Legion bug WAR: don't instantiate futures on framebuffer by @manopapad in #409
- Revive dead region managers on field allocations by @magnatelee in #418
🚀 New Features
- Support for mapper logging by @magnatelee in #356
- Provenance tracking by @magnatelee in #370
- Add Fill operation by @manopapad in #369
- add jupyter config for legate by @eddy16112 in #309
🛠️ Improvements
- Make stores have an explicit bottom in their transform stacks by @magnatelee in #320
- Update conda env files to match cunumeric by @manopapad in #324
- An internal method to force initialize communicators by @magnatelee in #328
- Use empty buffers to create empty output stores by @magnatelee in #330
- Two improvements to error handling by @magnatelee in #336
- Skip conduit check when binding by @manopapad in #342
- Make numactl optional by @manopapad in #343
- Remove deprecated option --no-tensor by @manopapad in #344
- Silence shard registration warnings by @manopapad in #347
- Instance manager improvements by @magnatelee in #350
- A custom task wrapper for efficient handling of return values by @magnatelee in #353
- Refactoring to make the runtime object singleton by @magnatelee in #363
- Turn off the precise stacktrace capturing by default by @magnatelee in #362
- Add CMake build for C++ and scikit-build infrastructure for Python package installation by @jjwilke in #323
- Support building with GASNet-Ex and MPI backends by @manopapad in #384
- Better store management by @magnatelee in #364
- Modularize the legate driver by @bryevdv in #371
- Add a pool of region managers with LRU eviction by @magnatelee in #392
- Adjust consensus match frequency based on field sizes by @magnatelee in #402
- On mapping failure retry after tightening non-RO reqs by @manopapad in #424
📖 Documentation
New Contributors
Full Changelog: v22.08.00...v22.10.00