diff --git a/docs/data/programming_model/understand/cdna3_cu_dark.png b/docs/data/programming_model/understand/cdna3_cu_dark.png deleted file mode 100644 index 3fada0d43f..0000000000 Binary files a/docs/data/programming_model/understand/cdna3_cu_dark.png and /dev/null differ diff --git a/docs/data/programming_model/understand/rdna3_cu.drawio b/docs/data/programming_model/understand/rdna3_cu.drawio deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/docs/data/hardware_implementation/cdna2_gcd.png b/docs/data/understand/hardware_implementation/cdna2_gcd.png similarity index 100% rename from docs/data/hardware_implementation/cdna2_gcd.png rename to docs/data/understand/hardware_implementation/cdna2_gcd.png diff --git a/docs/data/hardware_implementation/cdna3_cu.png b/docs/data/understand/hardware_implementation/cdna3_cu.png similarity index 100% rename from docs/data/hardware_implementation/cdna3_cu.png rename to docs/data/understand/hardware_implementation/cdna3_cu.png diff --git a/docs/data/hardware_implementation/compute_unit.drawio b/docs/data/understand/hardware_implementation/compute_unit.drawio similarity index 100% rename from docs/data/hardware_implementation/compute_unit.drawio rename to docs/data/understand/hardware_implementation/compute_unit.drawio diff --git a/docs/data/hardware_implementation/compute_unit.svg b/docs/data/understand/hardware_implementation/compute_unit.svg similarity index 100% rename from docs/data/hardware_implementation/compute_unit.svg rename to docs/data/understand/hardware_implementation/compute_unit.svg diff --git a/docs/data/hardware_implementation/rdna3_cu.png b/docs/data/understand/hardware_implementation/rdna3_cu.png similarity index 100% rename from docs/data/hardware_implementation/rdna3_cu.png rename to docs/data/understand/hardware_implementation/rdna3_cu.png diff --git a/docs/data/programming_model/understand/cdna2_gcd.png b/docs/data/understand/programming_model/cdna2_gcd.png similarity index 100% rename from docs/data/programming_model/understand/cdna2_gcd.png rename to docs/data/understand/programming_model/cdna2_gcd.png diff --git a/docs/data/programming_model/understand/cdna3_cu.png b/docs/data/understand/programming_model/cdna3_cu.png similarity index 100% rename from docs/data/programming_model/understand/cdna3_cu.png rename to docs/data/understand/programming_model/cdna3_cu.png diff --git a/docs/data/programming_model/understand/rdna3_cu.png b/docs/data/understand/programming_model/rdna3_cu.png similarity index 100% rename from docs/data/programming_model/understand/rdna3_cu.png rename to docs/data/understand/programming_model/rdna3_cu.png diff --git a/docs/data/programming_model/understand/simt.drawio b/docs/data/understand/programming_model/simt.drawio similarity index 100% rename from docs/data/programming_model/understand/simt.drawio rename to docs/data/understand/programming_model/simt.drawio diff --git a/docs/data/programming_model/understand/simt.svg b/docs/data/understand/programming_model/simt.svg similarity index 100% rename from docs/data/programming_model/understand/simt.svg rename to docs/data/understand/programming_model/simt.svg diff --git a/docs/data/programming_model/reference/memory_hierarchy.drawio b/docs/data/understand/programming_model_reference/memory_hierarchy.drawio similarity index 100% rename from docs/data/programming_model/reference/memory_hierarchy.drawio rename to docs/data/understand/programming_model_reference/memory_hierarchy.drawio diff --git a/docs/data/programming_model/reference/memory_hierarchy.svg b/docs/data/understand/programming_model_reference/memory_hierarchy.svg similarity index 100% rename from docs/data/programming_model/reference/memory_hierarchy.svg rename to docs/data/understand/programming_model_reference/memory_hierarchy.svg diff --git a/docs/data/programming_model/reference/thread_hierarchy.drawio b/docs/data/understand/programming_model_reference/thread_hierarchy.drawio similarity index 100% rename from docs/data/programming_model/reference/thread_hierarchy.drawio rename to docs/data/understand/programming_model_reference/thread_hierarchy.drawio diff --git a/docs/data/programming_model/reference/thread_hierarchy.svg b/docs/data/understand/programming_model_reference/thread_hierarchy.svg similarity index 100% rename from docs/data/programming_model/reference/thread_hierarchy.svg rename to docs/data/understand/programming_model_reference/thread_hierarchy.svg diff --git a/docs/data/programming_model/reference/thread_hierarchy_coop.drawio b/docs/data/understand/programming_model_reference/thread_hierarchy_coop.drawio similarity index 100% rename from docs/data/programming_model/reference/thread_hierarchy_coop.drawio rename to docs/data/understand/programming_model_reference/thread_hierarchy_coop.drawio diff --git a/docs/data/programming_model/reference/thread_hierarchy_coop.svg b/docs/data/understand/programming_model_reference/thread_hierarchy_coop.svg similarity index 100% rename from docs/data/programming_model/reference/thread_hierarchy_coop.svg rename to docs/data/understand/programming_model_reference/thread_hierarchy_coop.svg diff --git a/docs/understand/hardware_implementation.rst b/docs/understand/hardware_implementation.rst index f95d8fc6b4..8ee3e0e08c 100644 --- a/docs/understand/hardware_implementation.rst +++ b/docs/understand/hardware_implementation.rst @@ -46,7 +46,7 @@ The amount of warps that can reside concurrently on a CU, known as occupancy, is determined by the warp's resource usage of registers and shared memory. -.. figure:: ../data/hardware_implementation/compute_unit.svg +.. figure:: ../data/understand/hardware_implementation/compute_unit.svg :alt: Diagram depicting the general structure of a compute unit of an AMD GPU. @@ -110,9 +110,9 @@ The general structure of CUs stays mostly as it is in GCN architectures. The most prominent change is the addition of matrix ALUs, which can greatly improve the performance of algorithms involving matrix multiply-accumulate operations for -:doc:`int8, float16, bfloat16 or float32`. +:doc:`int8, float16, bfloat16 or float32`. -.. figure:: ../data/hardware_implementation/cdna3_cu.png +.. figure:: ../data/understand/hardware_implementation/cdna3_cu.png :alt: Block diagram showing the structure of a CDNA3 compute unit. It includes Shader Cores, the Matrix Core Unit, a Local Data Share used for sharing memory between threads in a block, an L1 Cache and a Scheduler. The @@ -136,7 +136,7 @@ It also adds an extra layer of cache to the WGP, shared by the CUs within it. This cache is referred to as L1 cache, promoting the per-CU cache to an L0 cache. -.. figure:: ../data/hardware_implementation/rdna3_cu.png +.. figure:: ../data/understand/hardware_implementation/rdna3_cu.png :alt: Block diagram showing the structure of an RDNA3 Compute Unit. It consists of four SIMD units, each including a vector and scalar register file, with the corresponding scalar and vector ALUs. All four SIMDs @@ -152,7 +152,7 @@ For hardware implementation's sake, multiple CUs are grouped together into a Shader Engine or Compute Engine, typically sharing some fixed function units or memory subsystem resources. -.. figure:: ../data/hardware_implementation/cdna2_gcd.png +.. figure:: ../data/understand/hardware_implementation/cdna2_gcd.png :alt: Block diagram showing four Compute Engines each with 28 Compute Units inside. These four Compute Engines share one block of L2 Cache. Around them are four Memory Controllers. To the top and bottom of all these are diff --git a/docs/understand/programming_model.rst b/docs/understand/programming_model.rst index 092cf6796c..4307226064 100644 --- a/docs/understand/programming_model.rst +++ b/docs/understand/programming_model.rst @@ -30,7 +30,7 @@ AMD block diagrams, or as streaming multiprocessor (SM). .. _rdna3_cu: -.. figure:: ../data/programming_model/understand/rdna3_cu.png +.. figure:: ../data/understand/programming_model/rdna3_cu.png :alt: Block diagram showing the structure of an RDNA3 Compute Unit. It consists of four SIMD units, each including a vector and scalar register file, with the corresponding scalar and vector ALUs. All four SIMDs @@ -41,7 +41,7 @@ AMD block diagrams, or as streaming multiprocessor (SM). .. _cdna3_cu: -.. figure:: ../data/programming_model/understand/cdna3_cu.png +.. figure:: ../data/understand/programming_model/cdna3_cu.png :alt: Block diagram showing the structure of a CDNA3 compute unit. It includes Shader Cores, the Matrix Core Unit, a Local Data Share used for sharing memory between threads in a block, an L1 Cache and a Scheduler. The @@ -56,7 +56,7 @@ memory subsystem resources. .. _cdna2_gcd: -.. figure:: ../data/programming_model/understand/cdna2_gcd.png +.. figure:: ../data/understand/programming_model/cdna2_gcd.png :alt: Block diagram showing four Compute Engines each with 28 Compute Units inside. These four Compute Engines share one block of L2 Cache. Around them are four Memory Controllers. To the top and bottom of all these are @@ -103,7 +103,7 @@ typically look the following: .. _simt: -.. figure:: ../data/programming_model/understand/simt.svg +.. figure:: ../data/understand/programming_model/simt.svg :alt: Image representing the instruction flow of a SIMT program. Two identical arrows pointing downward with blocks representing the instructions inside and ellipsis between the arrows. The instructions represented in diff --git a/docs/understand/programming_model_reference.rst b/docs/understand/programming_model_reference.rst index 600fcad3da..1120728dad 100644 --- a/docs/understand/programming_model_reference.rst +++ b/docs/understand/programming_model_reference.rst @@ -34,7 +34,7 @@ The thread hierarchy inherent to how AMD GPUs operate is depicted in .. _inherent_thread_hierarchy: -.. figure:: ../data/programming_model/reference/thread_hierarchy.svg +.. figure:: ../data/understand/programming_model_reference/thread_hierarchy.svg :alt: Diagram depicting nested rectangles of varying color. The outermost one titled "Grid", inside sets of uniform rectangles layered on one another titled "Block". Each "Block" containing sets of uniform rectangles @@ -93,7 +93,7 @@ The thread hierarchy abstraction of Cooperative Groups manifest as depicted in .. _coop_thread_hierarchy: -.. figure:: ../data/programming_model/reference/thread_hierarchy_coop.svg +.. figure:: ../data/understand/programming_model_reference/thread_hierarchy_coop.svg :alt: Diagram depicting nested rectangles of varying color. The outermost one titled "Grid", inside sets of different sized rectangles layered on one another titled "Block". Each "Block" containing sets of uniform @@ -134,7 +134,7 @@ how they relate to the various levels of the threading model. .. _memory_hierarchy: -.. figure:: ../data/programming_model/reference/memory_hierarchy.svg +.. figure:: ../data/understand/programming_model_reference/memory_hierarchy.svg :alt: Diagram depicting nested rectangles of varying color. The outermost one titled "Grid", inside on the upper half a rectangle titled "Cluster". Inside it are two identical rectangles titled "Block", inside them are