Version 19.12.0 (December 31, 2019)
streichler
released this
01 Jan 00:32
·
11603 commits
to stable
since this release
- Build
- Both builds (Make and CMake) now generate
legion_defines.h
and
realm_defines.h
. By default these headers are generated in
the source directory (Make) or build directory (CMake). This
means that languages such as Regent and Python no longer
require MAX_DIM to be specified explicitly
- Both builds (Make and CMake) now generate
- Regent
- Support for CUDA 10
- Support for field polymorphic tasks
- Substantially improved the generality of the index launch
optimization. Task arguments of the form p[i+k] may now be
used, where k is a variable defined outside of the loop - Add flag
-foverride-demand-index-launch
which can be used to
force loops to be index launched in cases where the compiler
cannot prove the disjointness of read-write region
arguments - Added reductions for complex64
- The scripts
install.py
andsetup_env.py
now use CMake to
build Terra by default, which should improve portability on
most machines - The behavior of
-fcuda 1
has changed: this flag will now issue
an error if CUDA cannot be enabled (e.g. because the build
does not support CUDA, or because the machine has no
GPUs). Omitting this flag will now enable CUDA if it is
available (and will not error if it is not available).
The behavior of-fopenmp 1
has changed similarly. - The behavior of
__demand(__cuda)
has changed. This will now
issue an error if a loop is not eligible for the CUDA
transformation, regardless of whether CUDA is actually
available on the current machine or not. The behavior of
__demand(__openmp)
has changed similarly. - The annotation
__allow(__cuda)
is now permitted, and permits
(but does not require) tasks to be optimized with CUDA. - Experimental support for 2D kernel launch in the CUDA code generation
- Python
- Add support for copies
- Copies and fills now support multiple fields
- Tasks (including index launches) now support setting the mapper
ID and tag
- Legion
- A major overhaul of the Legion physical analysis to use an
approach based on bounding volume hierarchies. The change is
not visible to users, but will likely impact performance. Most
programs will get faster; programs that create many partitions
frequently on the fly may get slower. The later case will be fixed
in an upcoming release. - Added support for indirect copy operations such as gather and
scatter onto existing copy launchers
- A major overhaul of the Legion physical analysis to use an
- Realm
Event::subscribe
allows polling viaEvent::has_triggered
to
(eventually) succeed- Addition of
CompletionQueue
objects that allow multiple unordered
Event
triggers to be efficiently handled by a single consumer - Support for
omp_get_level
,omp_in_parallel
, and
omp_set_num_threads
in tasks running on OpenMP processors - Support for unstructured scatter and/or gather in copies. (Handling
structured cases as well as fills/reductions remains a work in
progress.) - Removed all calls to
Event::wait
from inside other Realm API calls.
Applications now must make sure that index spaces and instance
metadata are valid before use. For details, see: #465