-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kernel fusing using RAJA #167
base: develop
Are you sure you want to change the base?
Conversation
davidbeckingsale
commented
Mar 4, 2021
•
edited
Loading
edited
- Make CoarsenCopyTransaction inherit from FuseableTransaction
- Flesh out KernelFuser
- Add KernelFuser argument to loop API
Fuseable transactions still need to be fused.
source/SAMRAI/tbox/ExecutionPolicy.h
Outdated
@@ -111,6 +111,11 @@ struct policy_traits<policy::parallel> { | |||
>; | |||
|
|||
using ReductionPolicy = RAJA::cuda_reduce; | |||
|
|||
using WorkGroupPolicy = RAJA::WorkGroupPolicy< | |||
RAJA::cuda_work_async<1024>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a way to set the workgroup block size in case people run into cuda linking issues?
source/SAMRAI/tbox/Schedule.C
Outdated
} | ||
d_recv_fuser->launch(); | ||
#if defined(HAVE_RAJA) | ||
parallel_synchronize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this synchronization between fused and non-fused necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is necesary, what about if d_recv_sets_fuseable[sender]
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be needed if it's empty and the above loop is a no-op. I think also the launch() call is unnecessary in that case.
source/SAMRAI/hier/PatchData.h
Outdated
const PatchData& src, | ||
const BoxOverlap& overlap) = 0; | ||
|
||
virtual void | ||
copy( | ||
const PatchData& src, | ||
const BoxOverlap& overlap, | ||
tbox::KernelFuser& fuser); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands I'll have to implement both if I want to use fusion. I guess I can have one implementation for both that takes the fuser pointer and use or not use it under an abstraction layer to keep things single source. I'll need to use some macros to maintain support for older versions of samrai but that was pretty much inevitable.
source/SAMRAI/hier/PatchData.h
Outdated
packStream( | ||
tbox::MessageStream& stream, | ||
const BoxOverlap& overlap, | ||
tbox::KernelFuser& fuser); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks like this should be const.
Could you add a feature availability macro in your config file so its easy to check if the fuser exists? |
in pdat classes that take KernelFuser.
config/SAMRAI_config.h.cmake.in
Outdated
@@ -358,7 +358,9 @@ | |||
/* Configure for compiling on BGL family of machines */ | |||
#undef __BGL_FAMILY__ | |||
|
|||
|
|||
#ifdef HAVE_RAJA | |||
#define HAVE_KERNEL_FUSER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAMRAI_HAVE_KERNEL_FUSER?
source/SAMRAI/tbox/Schedule.C
Outdated
} | ||
TBOX_ASSERT(mi->first == send_coms[icom].getPeerRank()); | ||
#if defined(HAVE_RAJA) | ||
parallel_synchronize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fuser cleanup call
and add needed guards for non-cuda builds
have been launched
StagedKernelFusers
launched kernels.