From a1c70e1276a9eab6d73e7f6cdf17cac2fa48f96e Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Wed, 10 Jul 2024 16:04:21 +0200 Subject: [PATCH] Prepare for 2nd round review --- docs/how-to/cooperative_groups.rst | 19 ++++++++++------- .../cooperative_groups_reference.rst | 21 +++++++------------ 2 files changed, 19 insertions(+), 21 deletions(-) diff --git a/docs/how-to/cooperative_groups.rst b/docs/how-to/cooperative_groups.rst index 6a207aabcf..6733afbe15 100644 --- a/docs/how-to/cooperative_groups.rst +++ b/docs/how-to/cooperative_groups.rst @@ -30,10 +30,10 @@ The **multi grid** is an abstraction of potentially multiple simultaneous launch For further information, check the :ref:`inherent thread model `. For details on memory model, check the :ref:`memory model description ` -Cooperative group simple example -================================ +Cooperative groups simple example +================================= -You'll find the code difference to the original block model in the following examples. +The difference to the original block model in the ``reduce_sum`` device function is the following. .. tab-set:: .. tab-item:: Original Block @@ -98,6 +98,8 @@ You'll find the code difference to the original block model in the following exa // ... } +The ``reduce_sum`` function call and input data initialization difference to the original block model is the following. + .. tab-set:: .. tab-item:: Original Block :sync: original-block @@ -133,6 +135,7 @@ You'll find the code difference to the original block model in the following exa // ... + // Initialize the thread_block thread_block thread_block_group = this_thread_block(); // Perform reduction output = reduce_sum(thread_block_group, workspace, input); @@ -151,7 +154,7 @@ Group types are based on the levels of synchronization and data sharing among th Thread-block group ------------------ -Represents an intra-workgroup cooperative group type where the participating threads within the group are the same threads that participated in the currently executing ``workgroup``. +Represents an intra-workgroup cooperative groups type where the participating threads within the group are the same threads that participated in the currently executing ``workgroup``. .. code-block:: cpp @@ -168,7 +171,7 @@ The ``group_index()`` , ``thread_index()`` , ``thread_rank()`` , ``size()``, ``c Grid group ------------ -Represents an inter-workgroup cooperative group type where the group's participating threads span multiple workgroups running the same kernel on the same device. Use the cooperative launch API to synchronize across the grid. +Represents an inter-workgroup cooperative groups type where the group's participating threads span multiple workgroups running the same kernel on the same device. Use the cooperative launch API to synchronize across the grid. .. code-block:: cpp @@ -186,7 +189,7 @@ are public of the ``grid_group`` class. For further details, check the :ref:`gri Multi-grid group ------------------ -Represents an inter-device cooperative group type where the participating threads within the group span multiple devices that run the same kernel on the devices. All the multi-grid group APIs require that you have used the appropriate launch API. +Represents an inter-device cooperative groups type where the participating threads within the group span multiple devices that run the same kernel on the devices. All the multi-grid group APIs require that you have used the appropriate launch API. .. code-block:: cpp @@ -207,7 +210,7 @@ Thread-block tile ------------------ This constructs a templated class derived from ``thread_group``. The template defines the tile -size of the new thread group at compile time. +size of the new thread group at compile time. This group type also supports sub-wave level intrinsics. .. code-block:: cpp @@ -254,7 +257,7 @@ The ``thread_rank()`` , ``size()``, ``cg_type()``, ``is_valid()``, ``sync()``, ` Synchronization =============== -With each group type, the synchronization requires using the correct cooperative group +With each group type, the synchronization requires using the correct cooperative groups launch API. **Check the kernel launch capability** diff --git a/docs/reference/cooperative_groups_reference.rst b/docs/reference/cooperative_groups_reference.rst index 53c2aab0cd..d02dd2ae95 100644 --- a/docs/reference/cooperative_groups_reference.rst +++ b/docs/reference/cooperative_groups_reference.rst @@ -22,10 +22,10 @@ The following host side functions used for cooperative kernel launches. .. doxygenfunction:: hipModuleLaunchCooperativeKernelMultiDevice -Cooperative group classes -========================= +Cooperative groups classes +========================== -The following cooperative group classes can be used on the device side. +The following cooperative groups classes can be used on the device side. .. _thread_group_ref: @@ -47,12 +47,7 @@ The following cooperative group classes can be used on the device side. .. doxygenclass:: cooperative_groups::multi_grid_group :members: -.. _tiled_group_ref: - -.. doxygenclass:: cooperative_groups::tiled_group - :members: - -.. _thread_block_tile_type_ref: +.. _thread_block_tile_ref: .. doxygenclass:: cooperative_groups::thread_block_tile :members: @@ -62,8 +57,8 @@ The following cooperative group classes can be used on the device side. .. doxygenclass:: cooperative_groups::coalesced_group :members: -Cooperative group construct functions -===================================== +Cooperative groups construct functions +====================================== The following functions are used to construct different group types instances on the device side. @@ -83,8 +78,8 @@ The following functions are used to construct different group types instances on .. doxygenfunction:: cooperative_groups::binary_partition(const thread_block_tile& tgrp, bool pred) -Cooperative group exposed API functions -======================================= +Cooperative groups exposed API functions +======================================== The following functions are the exposed API for different group types instances on the device side.