Skip to content

Commit

Permalink
Merge pull request #535 from RL-S/patch-5
Browse files Browse the repository at this point in the history
Parallel dispatch, remove duplicates: one word and one clause
  • Loading branch information
crtrott authored Jun 18, 2024
2 parents d32a50c + 9f034a5 commit e00d86d
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/source/ProgrammingGuide/ParallelDispatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Important notes on syntax:

A _functor_ is one way to define the body of a parallel loop. It is a class or struct<sup>1</sup> with a public `operator()` instance method. That method's arguments depend on both which parallel operation you want to execute (for, reduce, or scan), and on the loop's execution policy (e.g., range or team). For an example of a functor see the section in this chapter for each type of parallel operation. In the most common case of a [`parallel_for()`](../API/core/parallel-dispatch/parallel_for), it takes an integer argument which is the for loop's index. Other arguments are possible; see [Chapter 8 - Hierarchical Parallelism](HierarchicalParallelism).

The `operator()` method must be const, and must be marked with the `KOKKOS_FUNCTION` or `KOKKOS_INLINE_FUNCTION` macro. For some backends (such as CUDA and HIP) this macro is necessary to mark mark your method as suitable for running on both accelerator devices and the host. If not building with any backends requiring markup, `KOKKOS_INLINE_FUNCTION` expands to `inline`, and `KOKKOS_FUNCTION` is unnecessary but harmless. Here is an example of the signature of such a method:
The `operator()` method must be const, and must be marked with the `KOKKOS_FUNCTION` or `KOKKOS_INLINE_FUNCTION` macro. For some backends (such as CUDA and HIP) this macro is necessary to mark your method as suitable for running on both accelerator devices and the host. If not building with any backends requiring markup, `KOKKOS_INLINE_FUNCTION` expands to `inline`, and `KOKKOS_FUNCTION` is unnecessary but harmless. Here is an example of the signature of such a method:

```c++
KOKKOS_INLINE_FUNCTION void operator() (...) const;
Expand All @@ -36,9 +36,9 @@ The entire parallel operation (for, reduce, or scan) shares the same instance of
The 2011 version of the C++ standard ("C++11") provides a new language construct, the _lambda_, also called "anonymous function" or "closure." Kokkos lets users supply parallel loop bodies as either functors (see above) or lambdas. Lambdas work like automatically generated functors. Just like a class, a lambda may have state. The only difference is that with a lambda, the state comes in from the environment. (The name "closure" means that the function "closes over" state from the environment.) Just like with functors, lambdas must bring in state by "value" (copy), not by reference or pointer.
By default, lambdas capture nothing (as the default capture specifier `[]` specifies). This is not likely to be useful, since [`parallel_for()`](../API/core/parallel-dispatch/parallel_for) generally works by its side effects. Thus, we recommend using the ``capture by value'' specifier `[=]` by default. You may also explicitly specify variables to capture, but they must be captured by value. We prefer that for the outermost level of parallelism (see [Chapter 8](HierarchicalParallelism)), you use the `KOKKOS_LAMBDA` macro instead of the capture clause.
If CUDA is disabled, this just turns into the usual capture-by-value clause `[=]`. That captures variables from the surrounding scope by value. Do NOT capture them by reference! If CUDA is enabled, this macro may have a special definition
that makes the lambda work correctly with CUDA. Compare to the `KOKKOS_INLINE_FUNCTION` macro, which has a special meaning if CUDA is enabled. If you do not plan to build with CUDA, you may use `[=]` explicitly, but we find using the macro easier than remembering the capture clause syntax.
By default, lambdas capture nothing (as the default capture specifier `[]` specifies). This is not likely to be useful, since [`parallel_for()`](../API/core/parallel-dispatch/parallel_for) generally works by its side effects. Because Kokkos reserves the right to make copies of the closure, and its operations are potentially asynchronous users must ``capture by value'' to be semantically correct. We recommend doing so via the KOKKOS_LAMBDA macro for the outermost level of parallelism (see [Chapter 8](HierarchicalParallelism)).
For some backends, this just turns into the usual capture-by-value clause `[=]`. That captures variables from the surrounding scope by value. Do NOT capture them by reference! For other backends (e.g. CUDA and HIP), this macro may have a special definition
that makes the lambda work correctly, same as the `KOKKOS_INLINE_FUNCTION` macro.
It is a violation of Kokkos semantics to capture by reference `[&]` for two reasons. First Kokkos might give the lambda to an execution space which can not access the stack of the dispatching thread. Secondly, capturing by reference allows the programmer to violate the const semantics of the lambda. For correctness and portability reasons lambdas and functors are treated as const objects inside the parallel code section. Capturing by reference allows a circumvention of that const property, and enables many more possibilities of writing non-threads-safe code.
Expand Down

0 comments on commit e00d86d

Please sign in to comment.