Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel fusing using RAJA #167

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from
Open

Conversation

davidbeckingsale
Copy link
Member

@davidbeckingsale davidbeckingsale commented Mar 4, 2021

  • Make CoarsenCopyTransaction inherit from FuseableTransaction
  • Flesh out KernelFuser
  • Add KernelFuser argument to loop API

@@ -111,6 +111,11 @@ struct policy_traits<policy::parallel> {
>;

using ReductionPolicy = RAJA::cuda_reduce;

using WorkGroupPolicy = RAJA::WorkGroupPolicy<
RAJA::cuda_work_async<1024>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a way to set the workgroup block size in case people run into cuda linking issues?

}
d_recv_fuser->launch();
#if defined(HAVE_RAJA)
parallel_synchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this synchronization between fused and non-fused necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is necesary, what about if d_recv_sets_fuseable[sender] is empty?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be needed if it's empty and the above loop is a no-op. I think also the launch() call is unnecessary in that case.

Comment on lines 169 to 176
const PatchData& src,
const BoxOverlap& overlap) = 0;

virtual void
copy(
const PatchData& src,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands I'll have to implement both if I want to use fusion. I guess I can have one implementation for both that takes the fuser pointer and use or not use it under an abstraction layer to keep things single source. I'll need to use some macros to maintain support for older versions of samrai but that was pretty much inevitable.

packStream(
tbox::MessageStream& stream,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not const?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like this should be const.

@MrBurmark
Copy link
Member

Could you add a feature availability macro in your config file so its easy to check if the fuser exists?

@@ -358,7 +358,9 @@
/* Configure for compiling on BGL family of machines */
#undef __BGL_FAMILY__


#ifdef HAVE_RAJA
#define HAVE_KERNEL_FUSER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAMRAI_HAVE_KERNEL_FUSER?

}
TBOX_ASSERT(mi->first == send_coms[icom].getPeerRank());
#if defined(HAVE_RAJA)
parallel_synchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuser cleanup call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants