From 1484d67ff1b5936a54b2a03de1b794244826472d Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Fri, 29 Nov 2024 14:36:18 +0100 Subject: [PATCH] WIP --- docs/how-to/hip_runtime_api/asynchronous.rst | 26 +++++++++----------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/docs/how-to/hip_runtime_api/asynchronous.rst b/docs/how-to/hip_runtime_api/asynchronous.rst index b819032501..bbbf821a72 100644 --- a/docs/how-to/hip_runtime_api/asynchronous.rst +++ b/docs/how-to/hip_runtime_api/asynchronous.rst @@ -64,16 +64,17 @@ Concurrent kernel execution ------------------------------------------------------------------------------- Concurrent execution of multiple kernels on the GPU allows different kernels to -run simultaneously, leveraging the parallel processing capabilities of the GPU. -Utilizing multiple streams enables developers to launch kernels concurrently, -maximizing GPU resource usage. Managing dependencies between kernels is crucial -for ensuring correct execution order. This can be achieved using -:cpp:func:`hipStreamWaitEvent`, which allows a kernel to wait for a specific -event before starting execution. Proper management of concurrent kernel -execution can lead to significant performance gains, particularly in -applications with independent tasks that can be parallelized. By maximizing the -utilization of GPU cores, developers can achieve higher throughput and -efficiency. +run simultaneously to maximize GPU resource usage. Managing dependencies between +kernels is crucial for ensuring correct execution order. This can be achieved +using :cpp:func:`hipStreamWaitEvent`, which allows a kernel to wait for a +specific event before starting execution. + +Independent kernels can only run concurrently, if there are enough registers and +share memories for the kernels. To reach concurrent kernel executions, the +developer may have to reduce the block size of the kernels. The kernel runtimes +can be misleading at concurrent kernel runs, that's why during optimization it's +better to check the trace files, to see if a kernel is blocking another kernel +while they are running parallel. Overlap of data transfer and kernel execution =============================================================================== @@ -175,10 +176,7 @@ to start before the primary kernel finishes, HIP achieves similar functionality using streams and events. By employing :cpp:func:`hipStreamWaitEvent`, it is possible to manage the execution order without explicit hardware support. This mechanism allows a secondary kernel to launch as soon as the necessary -conditions are met, even if the primary kernel is still running. Such an -approach optimizes resource utilization and improves performance by efficiently -overlapping operations, especially in complex applications with interdependent -tasks. +conditions are met, even if the primary kernel is still running. Example -------------------------------------------------------------------------------