Skip to content

Commit

Permalink
PR feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Aug 1, 2024
1 parent 87d9b0a commit 5c4d394
Show file tree
Hide file tree
Showing 6 changed files with 934 additions and 7 deletions.

This file was deleted.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 28 additions & 6 deletions docs/how-to/cooperative_groups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,18 @@ The API is accessible in the ``cooperative_groups`` namespace after the ``hip_c
Cooperative groups thread model
===============================

The thread hierarchy abstraction of cooperative groups is in
:numref:`coop_thread_hierarchy`.
The thread hierarchy abstraction of cooperative groups are in :ref:`grid hierarchy <coop_thread_top_hierarchy>` and :ref:`block hierarchy <coop_thread_bottom_hierarchy>`.

.. _coop_thread_hierarchy:
.. _coop_thread_top_hierarchy:

.. figure:: ../data/how-to/cooperative_groups/thread_hierarchy_coop.svg
.. figure:: ../data/how-to/cooperative_groups/thread_hierarchy_coop_top.svg
:alt: Diagram depicting nested rectangles of varying color. The outermost one
titled "Grid", inside sets of different sized rectangles layered on
one another titled "Block". Each "Block" containing sets of uniform
rectangles layered on one another titled "Warp". Each of the "Warp"
titled rectangles filled with downward pointing arrows inside.

Cooperative group thread hierarchy.
Cooperative group thread hierarchy in grids.

The **multi grid** is an abstraction of potentially multiple simultaneous launches of the same kernel over multiple devices (Deprecated since 5.0). The **grid** in cooperative groups is a single dispatch of kernels for execution like the original grid.

Expand All @@ -47,8 +46,19 @@ The **block** is the same as the :ref:`inherent_thread_model` block entity.

Explicit warp-level thread handling is absent from the Cooperative Groups API. In order to exploit the known hardware SIMD width on which built-in functionality translates to simpler logic, you can use the group partitioning part of the API, such as ``tiled_partition``.

.. _coop_thread_bottom_hierarchy:

.. figure:: ../data/how-to/cooperative_groups/thread_hierarchy_coop_bottom.svg
:alt: The new level between block thread and threads.

Cooperative group thread hierarchy in blocks.

The cooperative groups API introduce a new level between block thread and threads. The :ref:`thread-block tile <coop_thread_block_tile>` give the opportunity to have tiles in the thread block, while the :ref:`coalesced group <coop_coalesced_groups>` holds the active threads of the parent group. These groups further discussed in the :ref:`groups types <coop_group_types>` section.

For details on memory model, check the :ref:`memory model description <memory_hierarchy>`.

.. _coop_group_types:

Group types
===========

Expand Down Expand Up @@ -109,6 +119,8 @@ The ``num_grids()`` , ``grid_rank()`` , ``thread_rank()``, ``size()``, ``cg_type
and ``sync()`` member functions are public of the ``multi_grid_group`` class. For
further details check the :ref:`multi_grid_group references <multi_grid_group_ref>` .

.. _coop_thread_block_tile:

Thread-block tile
------------------

Expand All @@ -135,10 +147,20 @@ Constructed via:

The ``thread_rank()`` , ``size()``, ``cg_type()``, ``is_valid()``, ``sync()``, ``meta_group_rank()``, ``meta_group_size()``, ``shfl()``, ``shfl_down()``, ``shfl_up()``, ``shfl_xor()``, ``ballot()``, ``any()``, ``all()``, ``match_any()`` and ``match_all()`` member functions are public of the ``thread_block_tile`` class. For further details, check the :ref:`thread_block_tile references <thread_block_tile_ref>` .

.. _coop_coalesced_groups:

Coalesced groups
------------------

Threads (64 threads on CDNA and 32 threads on RDNA or NVIDIA GPUs) in a warp cannot execute different instructions simultaneously, so conditional branches are executed serially within the warp. When threads encounter a conditional branch, they can diverge, resulting in some threads being disabled, if they do not meet the condition to execute that branch. The active threads referred as coalesced, and coalesced group represents an active thread group within a warp.
Threads (64 threads on CDNA and 32 threads on RDNA) in a warp cannot execute different instructions simultaneously, so conditional branches are executed serially within the warp. When threads encounter a conditional branch, they can diverge, resulting in some threads being disabled, if they do not meet the condition to execute that branch. The active threads referred as coalesced, and coalesced group represents an active thread group within a warp.

.. note::

The NVIDIA GPU's independent thread scheduling presents the appearance that threads on different branches execute concurrently.

.. warning::

AMD GPUs do not support independent thread scheduling. Some CUDA application can rely on this feature and the ported HIP version on AMD GPUs can deadlock, when they try to make use of independent thread scheduling.

This group type also supports sub-wave level intrinsics.

Expand Down

0 comments on commit 5c4d394

Please sign in to comment.