expand documentation on backends

KernelTuner · Sep 8, 2023 · ce06057 · ce06057
1 parent 4f0aa8f
commit ce06057
Show file tree

Hide file tree

Showing 3 changed files with 77 additions and 1 deletion.
diff --git a/INSTALL.rst b/INSTALL.rst
@@ -26,7 +26,8 @@ Installing Python Packages
 --------------------------
 
 Note that when you are using a native Python installation, the `pip` command used 
-Kernel Tuner and its dependencies require `sudo` rights for system wide installation. 
+to install
+Kernel Tuner and its dependencies requires `sudo` rights for system wide installation. 
 
 Sudo rights are typically not required when using Miniconda or virtual environments.
 You could also use e.g. the `--user` or `--prefix` option of `pip` to install into 
@@ -79,6 +80,15 @@ from an installation that is failing.
 If this fails, I recommend to see the PyCuda installation guide (https://wiki.tiker.net/PyCuda/Installation)
 
 
+Other CUDA Backends
+-------------------
+
+Kernel Tuner can also be used with CuPy (https://cupy.dev/) or Nvidia's CUDA Python bindings (https://nvidia.github.io/cuda-python/). Please see the installation instructions of those projects for how the required Python packages.
+
+Please refer to the documentation on `backends <https://kerneltuner.github.io/kernel_tuner/stable/backends.html>`__ on how to use and select these backends.
+
+
+
 OpenCL and PyOpenCL
 -------------------
 

diff --git a/doc/source/backends.rst b/doc/source/backends.rst
@@ -0,0 +1,65 @@
+.. toctree::
+   :maxdepth: 2
+
+
+Backends
+========
+
+Kernel Tuner implements multiple backends for CUDA, one for OpenCL, one for HIP, and a generic 
+Compiler backend.
+
+Selecting a backend is in most cases automatic and is done based on the kernel's programming 
+language, but sometimes you'll want to specifically choose a backend.
+
+
+CUDA Backends
+-------------
+
+PyCUDA is default CUDA backend in Kernel Tuner. It is comparable in feature completeness with CuPy.
+Because the HIP kernel language is identical to the CUDA kernel language, HIP is included here as well.
+To use HIP on nvidia GPUs, see https://github.com/jatinx/hip-on-nv.
+
+While the PyCUDA backend expects all inputs and outputs to be Numpy arrays, the CuPy backend also 
+supports cupy arrays as input and output arguments for the kernels. This gives the user more control 
+over how memory is handled by Kernel Tuner. Also checks during output verification can happen 
+entirely on the GPU when using only cupy arrays.
+
+Texture memory is only supported by the PyCUDA backend, while the CuPy backend is the only one that 
+support C++ signatures for the kernels. With the other backends, it is required that the kernel has 
+extern "C" linkage. If not, the entire code is wrapped in an extern "C" block, which may cause issues 
+if the code also contains C++ code that cannot have extern "C" linkage, including code that may be 
+present in header files.
+
+As detailed further :ref:`templates`, templated kernels are fully supported by the CuPy backend and 
+limited support is implemented by Kernel Tuner to support templated kernels for the PyCUDA and 
+CUDA-Python backends.
+
+
+.. csv-table:: Backend feature support
+  :header: Feature, PyCUDA, CuPy, CUDA-Python, HIP
+  :widths: auto
+
+  Compile kernels,        ✓,  ✓,  ✓,  ✓
+  Benchmark kernels,      ✓,  ✓,  ✓,  ✓
+  Observers,              ✓,  ✓,  ✓,  ✓
+  Constant memory,        ✓,  ✓,  ✓,  ✓
+  Dynamic shared memory,  ✓,  ✓,  ✓,  ✓
+  Texture memory,         ✓,  ✗,  ✗,  ✗
+  C++ kernel signature,   ✗,  ✓,  ✗,  ✗
+  Templated kernels,      ✓,  ✓,  ✓,  ✗
+
+
+Another important difference between the different backends is the compiler that is used. The table 
+below lists which Python package is required, how the backend can be selected and which compiler is 
+used to compile the kernels.
+
+
+.. csv-table:: Backend usage and compiler
+  :header: Feature, PyCUDA, CuPy, CUDA-Python, HIP
+  :widths: auto
+
+  Python package,      "pycuda", "cupy", "cuda-python", "pyhip-interface"
+  Selected with lang=, "CUDA", "CUPY", "NVCUDA", "HIP"
+  Compiler used,       "nvcc", "nvrtc", "nvrtc", "hiprtc"
+
+
diff --git a/doc/source/contents.rst b/doc/source/contents.rst
@@ -27,6 +27,7 @@ The Kernel Tuner documentation
    :maxdepth: 1
    :caption: Features
 
+   backends
    cache_files
    correctness
    hostcode