Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel fusing using RAJA #167

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c9ac694
Add TransactionFuseable class
davidbeckingsale Dec 11, 2020
5ff40e2
Fixup handling fuseable/regular transactions
davidbeckingsale Jan 5, 2021
5cc28c3
Adding new API for fusing
davidbeckingsale Mar 4, 2021
beccc9f
Add guard for non-umpire builds
nselliott Apr 14, 2021
cdf1111
Add #define to signal existance of KernelFuser
nselliott Sep 30, 2021
049bfd0
Merge branch 'master' into feature/fuse-comm
davidbeckingsale Oct 15, 2021
6d600c8
Make CoarsenCopyTransaction fuseable
davidbeckingsale Oct 15, 2021
c0a037d
Merge branch 'master' into feature/nselliott/kernel-fuser
nselliott Nov 11, 2021
91f0636
Start to turn on workgroups for KernelFuser, and add placeholder methods
nselliott Nov 11, 2021
31b3798
Add to the implementation of KernelFuser
nselliott Nov 18, 2021
c249b86
Add kernel fuser allocator to AllocatorDatabase
nselliott Nov 18, 2021
e91952a
Add checks to do fuser launch and synchronize only when needed.
nselliott Nov 19, 2021
3ab15d8
Add kernel fuser cleanup
nselliott Nov 19, 2021
eaf6fd2
Change KernelFuser to a singleton and begin adding it to some ArrayData
nselliott Nov 24, 2021
d43092e
Add enqueue calls to some of the for_alls, add fuser in more places
nselliott Nov 30, 2021
f426913
Change name of virtual methods for fuseable operations in PatchData,
nselliott Dec 14, 2021
f8a5518
Add missing initialization in ArrayData
nselliott Dec 22, 2021
4fe3d2c
Make KernelFuser a true no-op in non-RAJA builds.
nselliott Jan 7, 2022
c57914d
Rearrange split message receives in AsyncCommPeer to avoid CUDA
nselliott Jan 8, 2022
b6fee9f
Add cmake option for setting number of threads for RAJA WorkGroup policy
nselliott Feb 3, 2022
d9c4ce3
Merge branch 'feature/fuse-comm' of github.com:LLNL/SAMRAI into featu…
nselliott Feb 3, 2022
899a9fa
Add logic to avoid synchronize calls when it is known that no kernels
nselliott Feb 8, 2022
9b08ece
Merge branch 'master' into feature/fuse-comm-merge
nselliott Mar 15, 2022
7c24184
Add methods for applications to indicate need for synchronization
nselliott Apr 4, 2022
78c618d
Clarify some documentation comments
nselliott Apr 4, 2022
2b1a9e1
Add option to set a synchronize between refine and postprocessRefine
nselliott Apr 13, 2022
1eb8cf5
Add KernelFuserStages as a singleton to hold and use KernelFuser
nselliott Apr 14, 2022
5a62605
Add ScheduleKernelFuser and change KernelFuserStages to
nselliott Apr 15, 2022
879e75e
Add StagedKernelFusers launch/cleanup in RefineSchedule
nselliott Apr 25, 2022
67ef461
Stop some synchronize calls between the communicate and refine steps
nselliott Apr 26, 2022
8a8a98e
Change tbox::Schedule to use StagedKernelFusers
nselliott May 3, 2022
f2230e9
Add optional synchronization around boundary conditions.
nselliott May 3, 2022
8df7260
Add StagedKernelFusers calls to PatchLevel allocate`
nselliott May 13, 2022
6f6c6bd
Add cuda dependency in some tests
nselliott May 18, 2022
6397677
Add kernel fusion calls to PatchLevel deallocate and CoarsenSchedule
nselliott May 24, 2022
41a65a6
Revise StagedKernelFusers, remove isActive, add check on whether it
nselliott Jul 7, 2022
cb6fa43
Remove stray printf
nselliott Jul 21, 2022
4837d89
Small fixes to work with RAJA 2022.03
nselliott Nov 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,6 @@ option(ENABLE_SAMRAI_TESTS "Enable SAMRAI Test Programs" On)
option(ENABLE_PERF_TESTS "Enable Performance Tests." Off)
set(NUM_PERF_PROCS 8 CACHE INT "Number of processors for performance tests.")
option(ENABLE_CHECK_ASSERTIONS "Enable assertion checking." On)
option(ENABLE_CHECK_DEV_ASSERTIONS "Enable SAMRAI developer assertion checking." Off)
option(ENABLE_CHECK_DIM_ASSERTIONS "Enable assertion checking for dimensions." Off)
option(ENABLE_BOX_COUNTING "Turns on box telemetry." Off)
option(ENABLE_DEPRECATED "Build with deprecated features." On)
option(ENABLE_TIMERS "Enable SAMRAI timers." On)
Expand All @@ -72,6 +70,7 @@ set(CUDA_ARCH "sm_70" CACHE STRING "Compute architecture to pass to CUDA builds"
set(CMAKE_CUDA_FLAGS "" CACHE STRING "")
set(CMAKE_INSTALL_LIBDIR lib)
#set(CMAKE_INSTALL_RPATH_USE_LINK_PATH Off CACHE Bool "Rpath uses Link path")
set(SAMRAI_RAJA_WORKGROUP_THREADS 512 CACHE INT "Number of workgroup threads")

include(GNUInstallDirs)

Expand Down
6 changes: 5 additions & 1 deletion config/SAMRAI_config.h.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,8 @@
/* Maximum dimension allowed */
#define SAMRAI_MAXIMUM_DIMENSION @SAMRAI_MAXIMUM_DIMENSION@

#define SAMRAI_RAJA_WORKGROUP_THREADS @SAMRAI_RAJA_WORKGROUP_THREADS@

/* Define to 1 if you have the ANSI C header files. */
#undef STDC_HEADERS

Expand All @@ -358,7 +360,9 @@
/* Configure for compiling on BGL family of machines */
#undef __BGL_FAMILY__


#ifdef HAVE_RAJA
#define SAMRAI_HAVE_KERNEL_FUSER
#endif

namespace SAMRAI {
static const unsigned short MAX_DIM_VAL = SAMRAI_MAXIMUM_DIMENSION;
Expand Down
4 changes: 4 additions & 0 deletions source/SAMRAI/hier/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,10 @@ if (ENABLE_MPI)
set (hier_depends ${hier_depends} mpi)
endif ()

if (ENABLE_CUDA)
set (hier_depends ${hier_depends} cuda)
endif ()

blt_add_library(
NAME SAMRAI_hier
SOURCES ${hier_sources}
Expand Down
138 changes: 138 additions & 0 deletions source/SAMRAI/hier/ForAll.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "SAMRAI/hier/Box.h"
#include "SAMRAI/hier/Index.h"
#include "SAMRAI/tbox/ExecutionPolicy.h"
#include "SAMRAI/tbox/KernelFuser.h"

#include <type_traits>
#include <tuple>
Expand Down Expand Up @@ -145,8 +146,37 @@ struct for_all<1> {
RAJA::make_tuple(make_range(ifirst, ilast, 0)),
body);
}

template <typename Policy, typename LoopBody,
typename std::enable_if<std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline static void eval(tbox::KernelFuser* fuser, const hier::Index& ifirst, const hier::Index& ilast, LoopBody body)
{
if (fuser == nullptr) {
RAJA::kernel<typename tbox::detail::policy_traits<Policy>::Policy1d>(
RAJA::make_tuple(make_range(ifirst, ilast, 0)),
body);
} else {
fuser->enqueue(ifirst(0), ilast(0), body);
}
}

template <typename Policy, typename LoopBody,
typename std::enable_if<!std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline static void eval(tbox::KernelFuser* fuser, const hier::Index& ifirst, const hier::Index& ilast, LoopBody body)
{
if (fuser == nullptr) {
RAJA::kernel<Policy>(
RAJA::make_tuple(make_range(ifirst, ilast, 0)),
body);
} else {
fuser->enqueue(ifirst(0), ilast(0), body);
}
}
};


// 2D and 3D don't use the fuser for anything pending suppor for
// multidimensional loops in KernelFuser.
template <>
struct for_all<2> {
template <typename Policy, typename LoopBody,
Expand All @@ -168,6 +198,28 @@ struct for_all<2> {
make_range(ifirst, ilast, 1)),
body);
}

template <typename Policy, typename LoopBody,
typename std::enable_if<std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline static void eval(tbox::KernelFuser* fuser, const hier::Index& ifirst, const hier::Index& ilast, LoopBody body)
{
NULL_USE(fuser);
RAJA::kernel<typename tbox::detail::policy_traits<Policy>::Policy2d>(
RAJA::make_tuple(make_range(ifirst, ilast, 0),
make_range(ifirst, ilast, 1)),
body);
}

template <typename Policy, typename LoopBody,
typename std::enable_if<!std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline static void eval(tbox::KernelFuser* fuser, const hier::Index& ifirst, const hier::Index& ilast, LoopBody body)
{
NULL_USE(fuser);
RAJA::kernel<Policy>(
RAJA::make_tuple(make_range(ifirst, ilast, 0),
make_range(ifirst, ilast, 1)),
body);
}
};

template <>
Expand All @@ -193,6 +245,30 @@ struct for_all<3> {
make_range(ifirst, ilast, 2)),
body);
}

template <typename Policy, typename LoopBody,
typename std::enable_if<std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline static void eval(tbox::KernelFuser* fuser, const hier::Index& ifirst, const hier::Index& ilast, LoopBody body)
{
NULL_USE(fuser);
RAJA::kernel<typename tbox::detail::policy_traits<Policy>::Policy3d>(
RAJA::make_tuple(make_range(ifirst, ilast, 0),
make_range(ifirst, ilast, 1),
make_range(ifirst, ilast, 2)),
body);
}

template <typename Policy, typename LoopBody,
typename std::enable_if<!std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline static void eval(tbox::KernelFuser* fuser, const hier::Index& ifirst, const hier::Index& ilast, LoopBody body)
{
NULL_USE(fuser);
RAJA::kernel<Policy>(
RAJA::make_tuple(make_range(ifirst, ilast, 0),
make_range(ifirst, ilast, 1),
make_range(ifirst, ilast, 2)),
body);
}
};

} // namespace detail
Expand All @@ -205,20 +281,52 @@ inline void for_all(int begin, int end, LoopBody body)
RAJA::forall<typename tbox::detail::policy_traits<Policy>::Policy>(RAJA::RangeSegment(begin, end), body);
}

template <typename Policy, typename LoopBody,
typename std::enable_if<std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline void for_all(tbox::KernelFuser* fuser, int begin, int end, LoopBody body)
{
if (fuser == nullptr) {
RAJA::forall<typename tbox::detail::policy_traits<Policy>::Policy>(RAJA::RangeSegment(begin, end), body);
} else {
fuser->enqueue(begin, end, body);
}
}

template <typename Policy, typename LoopBody,
typename std::enable_if<!std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline void for_all(int begin, int end, LoopBody body)
{
RAJA::forall<Policy>(RAJA::RangeSegment(begin, end), body);
}

template <typename Policy, typename LoopBody,
typename std::enable_if<!std::is_base_of<tbox::policy::base, Policy>::value, int>::type = 0>
inline void for_all(tbox::KernelFuser* fuser, int begin, int end, LoopBody body)
{
if (fuser == nullptr) {
RAJA::forall<Policy>(RAJA::RangeSegment(begin, end), body);
} else {
fuser->enqueue(begin, end, body);
}
}

// does NOT include end
template <typename LoopBody>
inline void parallel_for_all(int begin, int end, LoopBody body)
{
for_all<tbox::policy::parallel>(begin, end, body);
}

template <typename LoopBody>
inline void parallel_for_all(tbox::KernelFuser* fuser, int begin, int end, LoopBody body)
{
if (fuser == nullptr) {
for_all<tbox::policy::parallel>(begin, end, body);
} else {
for_all<tbox::policy::parallel>(fuser, begin, end, body);
}
}

template <typename LoopBody>
inline void host_parallel_for_all(int begin, int end, LoopBody body)
{
Expand All @@ -231,12 +339,25 @@ inline void for_all(const hier::Box& box, const int dim, LoopBody body)
for_all<Policy>(box.lower()(dim), box.upper()(dim) + 1, body);
}


template <typename Policy, typename LoopBody>
inline void for_all(tbox::KernelFuser* fuser, const hier::Box& box, const int dim, LoopBody body)
{
for_all<Policy>(fuser, box.lower()(dim), box.upper()(dim) + 1, body);
}

template <typename LoopBody>
inline void parallel_for_all(const hier::Box& box, const int dim, LoopBody body)
{
for_all<tbox::policy::parallel>(box.lower()(dim), box.upper()(dim) + 1, body);
}

template <typename LoopBody>
inline void parallel_for_all(tbox::KernelFuser* fuser, const hier::Box& box, const int dim, LoopBody body)
{
for_all<tbox::policy::parallel>(fuser, box.lower()(dim), box.upper()(dim) + 1, body);
}

template <typename LoopBody>
inline void host_parallel_for_all(const hier::Box& box, const int dim, LoopBody body)
{
Expand All @@ -250,12 +371,29 @@ inline void for_all(const hier::Box& box, LoopBody body)
detail::for_all<arg_count>::template eval<Policy>(box.lower(), box.upper(), body);
}

template <typename Policy, typename LoopBody>
inline void for_all(tbox::KernelFuser* fuser, const hier::Box& box, LoopBody body)
{
if (fuser == nullptr) {
for_all<Policy,LoopBody>(box, body);
} else {
constexpr int arg_count = detail::function_traits<LoopBody>::argument_count;
detail::for_all<arg_count>::template eval<Policy>(fuser, box.lower(), box.upper(), body);
}
}

template <typename LoopBody>
inline void parallel_for_all(const hier::Box& box, LoopBody body)
{
for_all<tbox::policy::parallel>(box, body);
}

template <typename LoopBody>
inline void parallel_for_all(tbox::KernelFuser* fuser, const hier::Box& box, LoopBody body)
{
for_all<tbox::policy::parallel>(fuser, box, body);
}

template <typename LoopBody>
inline void host_parallel_for_all(const hier::Box& box, LoopBody body)
{
Expand Down
24 changes: 24 additions & 0 deletions source/SAMRAI/hier/PatchData.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,30 @@ PatchData::~PatchData()
{
}

void
PatchData::copyFuseable(
const PatchData& src,
const BoxOverlap& overlap)
{
copy(src, overlap);
}

void
PatchData::packStreamFuseable(
tbox::MessageStream& stream,
const BoxOverlap& overlap) const
{
packStream(stream, overlap);
}

void
PatchData::unpackStreamFuseable(
tbox::MessageStream& stream,
const BoxOverlap& overlap)
{
unpackStream(stream, overlap);
}

/*
*************************************************************************
*
Expand Down
38 changes: 38 additions & 0 deletions source/SAMRAI/hier/PatchData.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@
#include "SAMRAI/tbox/Utilities.h"

namespace SAMRAI {

/*
* Forward declaration of KernelFuser class - required here because it sucks in
* RAJA and requires CUDA.
*/
//namespace tbox {
//class KernelFuser;
//}

namespace hier {

/**
Expand Down Expand Up @@ -160,6 +169,11 @@ class PatchData
const PatchData& src,
const BoxOverlap& overlap) = 0;

virtual void
copyFuseable(
const PatchData& src,
const BoxOverlap& overlap);

/**
* Copy data from the source into the destination using the designated
* overlap descriptor. The overlap description will have been computed
Expand Down Expand Up @@ -206,6 +220,18 @@ class PatchData
tbox::MessageStream& stream,
const BoxOverlap& overlap) const = 0;

/**
* Pack data lying on the specified index set into the output stream using
* the given KernelFuser. The default implementation of this method will
* call packStream without the fuser argument. See the abstract stream
* virtual base class for more information about the packing operators
* defined for streams.
*/
virtual void
packStreamFuseable(
tbox::MessageStream& stream,
const BoxOverlap& overlap) const;

/**
* Unpack data from the message stream into the specified index set.
* See the abstract stream virtual base class for more information about
Expand All @@ -216,6 +242,18 @@ class PatchData
tbox::MessageStream& stream,
const BoxOverlap& overlap) = 0;

/**
* Unpack data from the message stream into the specified index set using
* the given KernelFuser. The default implementation of this method will
* call unpackStream without the fuser argument. See the abstract stream
* virtual base class for more information about the packing operators
* defined for streams.
*/
virtual void
unpackStreamFuseable(
tbox::MessageStream& stream,
const BoxOverlap& overlap);

/**
* Checks that class version and restart file version are equal. If so,
* reads in the data members common to all patch data types from restart
Expand Down
2 changes: 2 additions & 0 deletions source/SAMRAI/hier/PatchLevel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@
************************************************************************/
#include "SAMRAI/hier/PatchLevel.h"

#include "SAMRAI/tbox/Collectives.h"
#include "SAMRAI/tbox/MathUtilities.h"
#include "SAMRAI/tbox/StagedKernelFusers.h"
#include "SAMRAI/tbox/TimerManager.h"
#include "SAMRAI/hier/BaseGridGeometry.h"
#include "SAMRAI/hier/BoxContainer.h"
Expand Down
Loading