Releases · StanfordLegion/legion

28 Mar 21:33

elliottslaughter

legion-22.03.0

bf6ce45

Version 22.03.0 (March 27, 2022)

Build
- Minimum supported cmake version is now 3.7. (Some optional features continue to require even newer versions.)
Realm
- Numerous bug fixes in the gasnetex network layer
- CUDA and HIP support allow direct specification of which gpus to use via -ll:gpu_ids command-line option
- Added support for copy paths using Cuda IPC between gpus on the same physical node
- For applications using CUDA without the runtime API hijack AND only submitting work to the default CUDA stream, -cuda:legacysync 1 improves the overhead of detecting the completion of device-side work launched by a task
- Realm reduction copies may now indicate exclusive access to the destination instance, improving performance by allowing simple load/store instead of atomic operations
- Custom reduction operations (including Legion's built-in ones) can provide HIP implementations, permitting in-place reductions in HIP device memory
Regent
- Support for custom serialization of types in task parameters and results
- New experimental timing library under std/timing

Assets 2

05 Jan 17:16

streichler

legion-21.12.0

e144311

Version 21.12.0 (December 31, 2021)

Realm
- Performance improvements for multi-dimensional copies, especially
  inter-process transfers
- Support for loading CUDA driver (if present) at runtime instead of
  link time, allowing same binary to be used on systems with and without
  CUDA-capable GPUs (enabled with -DLegion_CUDA_DYNAMIC_LOAD=ON in
  cmake build)
- A separate Memory is now created per process for external (system)
  memory instances. This memory has no capacity for creating instances
  and can confuse applications or Legion mappers that assume exactly
  one Memory of kind SYSTEM_MEM exists. Old behavior can be obtained
  with -ll:ext_sysmem 0, but this can fail for configurations that
  register system memory with the network and/or GPUs
- The MemoryQuery now supports a has_capacity predicate to restrict
  results to just memories with sufficient total (not current!) capacity
  to allocate an instance of a specified size
Build
- Cmake allows control of max nodes (-DLegion_MAX_NUM_NODES=...) and
  max processors/node (-DLegion_MAX_NUM_PROCS=...) supported by
  Legion build
- Added dependency tracking to make-based builds

Assets 2

02 Oct 04:20

elliottslaughter

legion-21.09.0

5a991b7

Version 21.09.0 (September 28, 2021)

Realm
- Numerous bug fixes in the gasnetex network layer
- Support for HIP memory type registration with GASNet (with GASNet version 2021.9.0+)
- Arguments to spawned tasks may now be arbitrarily large (network-specific limits have been eliminated)
Regent
- Improved support for dynamic checks on index launches with potential interference between different region arguments
- Extensive fixes for separate compilation. This mode has now been verified to work with large-scale applications
- Removed long-obsolete support for __demand(__external)
Pygion
- Add support for layout constraints

Assets 2

25 Jun 15:37

streichler

legion-21.06.0

30e00fa

Version 21.06.0 (June 24, 2021)

Build
- Version information is now compiled into Realm and Legion. This takes
  the form of a string (e.g. "legion-21.06.0") rather than anything
  that can be compared (i.e. no semantic versioning here). Compile-time
  defines REALM_VERSION and LEGION_VERSION are available as well as
  run-time calls Realm::Runtime::get_library_version and
  Legion::Runtime::get_library_version.
Regent
- Support for dynamic checks on projection functors, enabling a
  much larger class of loops to be supported as index launches
- Support for local tasks (i.e., without going through the
  runtime) via __demand(__local)
Realm
- Windows (MSVC) builds are now tested in CI and and therefore more likely
  to work
- Realm runtime can now be shutdown and reinitialized in the same process.
  (Exception: GASNet-based network layers do not support this.)
- Registration of host memory with CUDA driver is skipped for host
  memories larger than 1GB by default due to CUDA driver overhead.
  This threshold can be increased (or decreased) with -cuda:hostreg
Tools
- New Rust implementation of Legion Prof is 5-15x faster than the
  original (even with PyPy). For more details, see:
  https://legion.stanford.edu/profiling/#rust-legion-prof

Assets 2

30 Mar 20:42

streichler

legion-21.03.0

0cf9ddd

Version 21.03.0 (March 30, 2021)

Build
- Cmake can build an embedded copy of GASNet as part of the Legion build
  with -DLegion_EMBED_GASNet=ON
Regent
- Contains three breaking changes to the Regent calling convention:
  - Reductions are now aggregated into region requirements and
    sorted by the index of the first field in the field space
    among the set of fields for each reduction.
  - Task arguments may be passed through either args or
    local_args for index launched tasks. (Previously Regent
    only used local_args.)
  - Region values passed via args to an index-launched task may
    be bogus. Instead the region requirement should be used to
    obtain the original region.
- Support for constant time index launches. These are enabled
  automatically, but can be forced on or off with __demand or
  __forbid with __constant_time_launches. This should
  improve scalability at extreme node counts.
- Support for rescape and remit to generate metaprogrammed
  code more easily.
- Experimental support for separate compilation via -fspeparate 1
  allows Regent programs to be compiled in parts (potentially in
  parallel). Note that separate compilation currently cannot be
  used with Bishop and requires one of either parallel or
  incremental compilation if regentlib.start is used (does not
  apply to regentlib.saveobj or regentlib.save_tasks).
Legion
- In the control replication branch users will find a new implementaiton
  of Legion's physical analysis that uses heuristics to select which
  sub-trees should be used for performing the analysis. Disjoint and
  complete partitions are especially helpful in aiding the runtime.
- There is a new implementation of the index space math inside of the
  runtime that now soundly and precisely detect congruences between
  index space math operations. This fixes a long-running class of bugs
  that would cause memory explosions in the physical analysis.
- In the control replication branch users can now map future values into
  memories the same as they do with regions. This means that future
  payloads can be placed directly on devices like GPUs. Similarly, the
  runtime now accepts future data from tasks that also reside in any
  memory in the machine including device memories.
- Both the master and control replication branches have support for
  index space attach operations.
- Expensive transitive reductions on traces are now computed in the
  background allowing trace replays to begin replaying immediately
  with only partial optimizations.
Realm
- Custom reduction operations (including Legion's built-in ones) can
  provide CUDA implementations, permitting in-place reductions in
  CUDA device memory
- Support for CUDA managed memory (via -ll:msize) that is coherent for
  both host and device access. Includes support for __managed__
  variables (only single-GPU if using CUDA runtime hijack mode)
- Event::wait may be called outside of Realm tasks, having the same
  thread-blocking behavior as Event::external_wait
- Experimental support for AMD HIP. Note that testing coverage is
  incomplete, and breakages may occur in between releases. For more
  details, see: #1028

Assets 2

01 Jan 15:55

streichler

legion-20.12.0

ad82a1c

Version 20.12.0 (December 28, 2020)

Build
- Legion and Realm now require a compiler with (at least) c++11 support
- Python scripts (e.g. legion_prof and legion_spy) require Python 3.5
Realm
- Improved performance of inter-node instance copies when data is not
  contiguous in source and/or destination
- Improved responsiveness of utility processors by not using them for
  background work by default
- Experimental support for building on Windows with MSVC
- Improved performance (and correctness) when running CUDA tasks without
  the runtime hijack enabled
- Added gasnetex network layer that uses GASNet-EX's native API (instead
  of the legacy GASNet-1 API support). Requires GASNet version 2020.11.0
  or newer. For more details, see: #986
Legion
- The mapping interface no longer requires the runtime to return valid
  instances for empty regions (e.g. regions with no points their index space)
Tools
- Legion Spy now has support for arbitrary number of dimensions
Examples
- examples/nccl gives a simple example of using NCCL with Legion

Assets 2

30 Jun 22:47

streichler

legion-20.06.0

62d9a51

Version 20.06.0 (June 29, 2020)

Regent
- Support for std/format module for type-safe formatted printing
- Support for documentation with LDoc
- Support for __future operator to import a C API future
Legion
- Support for inlining tasks into leaf contexts
- Support for global registration callbacks inside of tasks
- Added semantic tags for source file and line location
- Support for multi-region accessors for region requirements with
  co-location constraints
- Changes to semantics of deletion for index spaces, field spaces, and
  logical regions. For details, see: #812
- Support for creating fields spaces with initial fields
Realm
- Subgraphs can be used to capture a template of Realm operations
  that will be executed repeatedly. Subgraph definitions include
  support for "interpolating" values into individual operations'
  arguments on each instantiation of the subgraph template
- create_weighted_subspaces supports size_t weights for precise
  control over the size of each subspace
- Added support for omp critical constructs and dynamic loop
  schedules in OpenMP tasks
- Added support for cudaStreamLegacy and cudaStreamPerThread in
  CUDA tasks
- Realm logs now include a timestamp (relative to runtime init)
  by default. This behavior can be disabled with -logtime 0
- Performance improvements for copies/fills of 3D instances spaces in
  GPU device memory
- Added ability to compute a set of "covering rectangles" for sparse
  index spaces, allowing more compact representation in memory
- Added MultiAffineAccessor for accessing compact instances
- Added ability to delete a ProcessorGroup

Assets 2

31 Mar 22:54

elliottslaughter

legion-20.03.0

f3f4e7d

Version 20.03.0 (March 31, 2020)

Regent
- Behavior change: __fields and __physical now both require explicit field names, i.e., __fields(r.{x, y}) rather than __fields(r). This makes the behavior more unambiguous and helps to avoid bugs
- Added complete and incomplete keywords that can be used to mark partitions as such
- Added support for setting mapper ID and tag via t:set_mapper_id() and t:set_mapping_tag_id()
- Initial support for predicated execution of if and while statements
- Fixed several bugs, memory leaks and improved compile times
Legion
- Introduction of Fortran bindings for Legion
- Support for creating deferred index spaces from future values
- Support for construction of partitions from a map of domains or from a future map
- Support for reducing a future map to a single future asynchronously
Realm
- Support for Kokkos parallel launch constructs in Realm (and therefore Legion) tasks. Currently supported Kokkos execution spaces are: Serial, OpenMP, CUDA. Application data remains in logical regions, but accessors can be converted to Kokkos (unmanaged) Views if needed. See the kokkos_interop example
- Introduction of experimental MPI-based network layer, enabled with REALM_NETWORKS=mpi (make) or -DRealm_NETWORKS=mpi (cmake). Use REALM_NETWORKS=gasnet1 (or USE_GASNET=1, which still works) for the GASNet-based network layer (which works with GASNet-1 or GASNet-EX)
- CUDA Runtime API interposer (a.k.a. "hijack") can now be disabled with USE_CUDART_HIJACK=0 (make) or -DLegion_HIJACK_CUDART=OFF (cmake). This can reduce effectivenes of task-parallelism for CUDA tasks, so use only if needed
- More control over GPU selection via: -cuda:skipgpus N which leaves the first N GPUs available for other uses, -cuda:skipbusy which skips over busy GPUs, and -cuda:minavailmem M which skips GPUs with less than M device memory available
- Reduction in memory usage of Realm internal data structures
Tools
- There is a now a generic launcher script for running Python code with Legion that will execute an aribtrary Python program in the top-level task of a Legion program. This script mirrors the interface to CPython as closely as possible.
- Legion Spy now supports verification and rendering of indirection copies
- Legion Prof supports Instance layout constraints related to dimension ordering and field alignnment
- Legion Prof contains a menu option for viewing ready state of operations

Assets 2

01 Jan 00:32

streichler

legion-19.12.0

ba44f7b

Version 19.12.0 (December 31, 2019)

Build
- Both builds (Make and CMake) now generate legion_defines.h and
  realm_defines.h. By default these headers are generated in
  the source directory (Make) or build directory (CMake). This
  means that languages such as Regent and Python no longer
  require MAX_DIM to be specified explicitly
Regent
- Support for CUDA 10
- Support for field polymorphic tasks
- Substantially improved the generality of the index launch
  optimization. Task arguments of the form p[i+k] may now be
  used, where k is a variable defined outside of the loop
- Add flag -foverride-demand-index-launch which can be used to
  force loops to be index launched in cases where the compiler
  cannot prove the disjointness of read-write region
  arguments
- Added reductions for complex64
- The scripts install.py and setup_env.py now use CMake to
  build Terra by default, which should improve portability on
  most machines
- The behavior of -fcuda 1 has changed: this flag will now issue
  an error if CUDA cannot be enabled (e.g. because the build
  does not support CUDA, or because the machine has no
  GPUs). Omitting this flag will now enable CUDA if it is
  available (and will not error if it is not available).
  The behavior of -fopenmp 1 has changed similarly.
- The behavior of __demand(__cuda) has changed. This will now
  issue an error if a loop is not eligible for the CUDA
  transformation, regardless of whether CUDA is actually
  available on the current machine or not. The behavior of
  __demand(__openmp) has changed similarly.
- The annotation __allow(__cuda) is now permitted, and permits
  (but does not require) tasks to be optimized with CUDA.
- Experimental support for 2D kernel launch in the CUDA code generation
Python
- Add support for copies
- Copies and fills now support multiple fields
- Tasks (including index launches) now support setting the mapper
  ID and tag
Legion
- A major overhaul of the Legion physical analysis to use an
  approach based on bounding volume hierarchies. The change is
  not visible to users, but will likely impact performance. Most
  programs will get faster; programs that create many partitions
  frequently on the fly may get slower. The later case will be fixed
  in an upcoming release.
- Added support for indirect copy operations such as gather and
  scatter onto existing copy launchers
Realm
- Event::subscribe allows polling via Event::has_triggered to
  (eventually) succeed
- Addition of CompletionQueue objects that allow multiple unordered
  Event triggers to be efficiently handled by a single consumer
- Support for omp_get_level, omp_in_parallel, and
  omp_set_num_threads in tasks running on OpenMP processors
- Support for unstructured scatter and/or gather in copies. (Handling
  structured cases as well as fills/reductions remains a work in
  progress.)
- Removed all calls to Event::wait from inside other Realm API calls.
  Applications now must make sure that index spaces and instance
  metadata are valid before use. For details, see: #465

Assets 2

10 Sep 18:01

elliottslaughter

legion-19.09.0

78c4225

Version 19.09.0 (September 9, 2019)

Regent
- __demand(__index_launch) has been added as an alternative to __demand(__parallel) on for loops that avoids confusion with the auto-parallelizer. __demand(__parallel) on for loops is deprecated and now issues a warning; in a future release this warning will be upgraded to an error. For details, see: #520
- Multi-field expasion is deprecated and now issues an error. The error can be temporarily downgraded to a warning, but it is advised that users migrate codes away from this syntax as it will become a hard error in a future release. For details, see: #501
Legion
- Support for a built-in collection of reduction operators including sum, product, max, and min over a variety of types for CPUs and GPUs
Realm
- assorted bug, performance, and memory leak fixes
- fills to attached HDF5 instances are orders of magnitude faster
- support for reusing HDF5 file handles with -hdf5:openfiles option
- control which rank opens an HDF5 file with a rank=nnn: filename prefix
Build System
- Makefile-based flow attempts to detect CUDA location and GASNet conduit if they are not specified
- Makefile-based flow defaults to building CUDA fat binaries, but can still be overridden with the GPU_ARCH setting, which now accepts SM arch numbers (e.g. "70") as well as names (e.g. "volta")

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: StanfordLegion/legion

Version 22.03.0 (March 27, 2022)

Version 21.12.0 (December 31, 2021)

Version 21.09.0 (September 28, 2021)

Version 21.06.0 (June 24, 2021)

Version 21.03.0 (March 30, 2021)

Version 20.12.0 (December 28, 2020)

Version 20.06.0 (June 29, 2020)

Version 20.03.0 (March 31, 2020)

Version 19.12.0 (December 31, 2019)

Version 19.09.0 (September 9, 2019)