diff --git a/docs/how-to/faq.md b/docs/how-to/faq.md index 889a7623d5..ca271b5ba0 100644 --- a/docs/how-to/faq.md +++ b/docs/how-to/faq.md @@ -287,7 +287,7 @@ See {doc}`/how-to/logging` for more information. ## What are the maximum limits of kernel launch parameters? Product of block.x, block.y, and block.z should be less than 1024. -Please note, HIP does not support kernel launch with total work items defined in dimension with size `gridDim x blockDim >= 2^32 `, so `gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z ` are always less than 2^32. +Please note, HIP does not support kernel launch with total work items defined in dimension with size `gridDim x blockDim >= 2^32`, so `gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z` are always less than 2^32. ## Are __shfl_*_sync functions supported on HIP platform? @@ -316,8 +316,8 @@ If you have compiled the application yourself, make sure you have given the corr If you have a precompiled application/library (like rocblas, tensorflow etc) which gives you such error, there are one of two possibilities. -* The application/library does not ship code object bundles for **all** of your device(s): in this case you need to recompile the application/library yourself with correct `--offload-arch`. -* The application/library does not ship code object bundles for **some** of your device(s), for example you have a system with an APU + GPU and the library does not ship code objects for your APU. For this you can set the environment variable `HIP_VISIBLE_DEVICES` or `CUDA_VISIBLE_DEVICES` on NVdia platform, to only enable GPUs for which code object is available. This will limit the GPUs visible to your application and allow it to run. +* The application/library does not ship code object bundles for __all__ of your device(s): in this case you need to recompile the application/library yourself with correct `--offload-arch`. +* The application/library does not ship code object bundles for __some__ of your device(s), for example you have a system with an APU + GPU and the library does not ship code objects for your APU. For this you can set the environment variable `HIP_VISIBLE_DEVICES` or `CUDA_VISIBLE_DEVICES` on NVdia platform, to only enable GPUs for which code object is available. This will limit the GPUs visible to your application and allow it to run. Note: In previous releases, the error code is hipErrorNoBinaryForGpu with message "Unable to find code object for all current devices". The error code handling behavior is changed. HIP runtime shows the error code hipErrorSharedObjectInitFailed with message "Error: shared object initialization failed" on unsupported GPU. diff --git a/docs/how-to/hip_rtc.md b/docs/how-to/hip_rtc.md index ee9e234fb2..5e86e83ce0 100644 --- a/docs/how-to/hip_rtc.md +++ b/docs/how-to/hip_rtc.md @@ -1,6 +1,6 @@ # Programming for HIP Runtime Compiler (RTC) -HIP lets you compile kernels at runtime with the `hiprtc* ` APIs. +HIP lets you compile kernels at runtime with the `hiprtc*` APIs. Kernels can be stored as a text string and can be passed to HIPRTC APIs alongside options to guide the compilation. NOTE: @@ -12,7 +12,7 @@ NOTE: ## Example To use HIPRTC functionality, HIPRTC header needs to be included first. - `#include ` +`#include ` Kernels can be stored in a string: @@ -29,7 +29,7 @@ R"( )"}; ``` -Now to compile this kernel, it needs to be associated with hiprtcProgram type, which is done by declaring `hiprtcProgram prog; ` and associating the string of kernel with this program: +Now to compile this kernel, it needs to be associated with hiprtcProgram type, which is done by declaring `hiprtcProgram prog;` and associating the string of kernel with this program: ```cpp hiprtcCreateProgram(&prog, // HIPRTC program @@ -51,7 +51,7 @@ hiprtcCompileProgram(prog, // hiprtcProgram options); // Clang Options [Supported Clang Options](clang_options.md) ``` -hiprtcCompileProgram returns a status value which can be converted to string via `hiprtcGetErrorString `. If compilation is successful, hiprtcCompileProgram will return `HIPRTC_SUCCESS `. +hiprtcCompileProgram returns a status value which can be converted to string via `hiprtcGetErrorString`. If compilation is successful, hiprtcCompileProgram will return `HIPRTC_SUCCESS`. If the compilation fails, you can look up the logs via: @@ -77,9 +77,9 @@ hiprtcGetCode(prog, kernel_binary.data()); ``` After loading the binary, hiprtcProgram can be destroyed. - `hiprtcDestroyProgram(&prog); ` +`hiprtcDestroyProgram(&prog);` -The binary present in `kernel_binary ` can now be loaded via `hipModuleLoadData ` API. +The binary present in `kernel_binary` can now be loaded via `hipModuleLoadData` API. ```cpp hipModule_t module; @@ -228,10 +228,10 @@ int main() { HIPRTC provides a few HIPRTC specific flags -* `--gpu-architecture ` : This flag can guide the code object generation for a specific gpu arch. Example: `--gpu-architecture=gfx906:sramecc+:xnack- `, its equivalent to `--offload-arch `. +* `--gpu-architecture` : This flag can guide the code object generation for a specific gpu arch. Example: `--gpu-architecture=gfx906:sramecc+:xnack-`, its equivalent to `--offload-arch`. * This option is compulsory if compilation is done on a system without AMD GPUs supported by HIP runtime. * Otherwise, HIPRTC will load the hip runtime and gather the current device and its architecture info and use it as option. -* `-fgpu-rdc ` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs. +* `-fgpu-rdc` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs. ### Bitcode @@ -414,9 +414,9 @@ Currently, it returns hardcoded value. This should be implemented to return HIP ## Lowered Names (Mangled Names) -HIPRTC mangles the `__global__ ` function names and names of `__device__ ` and `__constant__ ` variables. If the generated binary is being loaded using the HIP Runtime API, the kernel function or `__device__/__constant__ ` variable must be looked up by name, but this is very hard when the name has been mangled. To overcome this, HIPRTC provides API functions that map `__global__ ` function or `__device__/__constant__ ` variable names in the source to the mangled names present in the generated binary. +HIPRTC mangles the `__global__` function names and names of `__device__` and `__constant__` variables. If the generated binary is being loaded using the HIP Runtime API, the kernel function or `__device__/__constant__` variable must be looked up by name, but this is very hard when the name has been mangled. To overcome this, HIPRTC provides API functions that map `__global__` function or `__device__/__constant__` variable names in the source to the mangled names present in the generated binary. -The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this functionality. First, a 'name expression' string denoting the address for the `__global__ ` function or `__device__/__constant__ ` variable is provided to hiprtcAddNameExpression. Then, the program is compiled with hiprtcCompileProgram. During compilation, HIPRTC will parse the name expression string as a C++ constant expression at the end of the user program. Finally, the function hiprtcGetLoweredName is called with the original name expression and it returns a pointer to the lowered name. The lowered name can be used to refer to the kernel or variable in the HIP Runtime API. +The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this functionality. First, a 'name expression' string denoting the address for the `__global__` function or `__device__/__constant__` variable is provided to hiprtcAddNameExpression. Then, the program is compiled with hiprtcCompileProgram. During compilation, HIPRTC will parse the name expression string as a C++ constant expression at the end of the user program. Finally, the function hiprtcGetLoweredName is called with the original name expression and it returns a pointer to the lowered name. The lowered name can be used to refer to the kernel or variable in the HIP Runtime API. ### Note @@ -426,7 +426,7 @@ The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this funct ### Example -kernel containing various definitions `__global__ ` functions/function templates and `__device__/__constant__ ` variables can be stored in a string. +kernel containing various definitions `__global__` functions/function templates and `__device__/__constant__` variables can be stored in a string. ```cpp static constexpr const char gpu_program[] { @@ -444,7 +444,7 @@ __global__ void f3(int *result) { *result = sizeof(T); } )"}; ``` -hiprtcAddNameExpression is called with various name expressions referring to the address of `__global__ ` functions and `__device__/__constant__ ` variables. +hiprtcAddNameExpression is called with various name expressions referring to the address of `__global__` functions and `__device__/__constant__` variables. ```cpp kernel_name_vec.push_back("&f1"); @@ -508,4 +508,4 @@ HIPRTC follows the below versioning. ## Deprecation notice * Currently HIPRTC APIs are separated from HIP APIs and HIPRTC is available as a separate library libhiprtc.so/libhiprtc.dll. But on Linux, HIPRTC symbols are also present in libhipamd64.so in order to support the existing applications. Gradually, these symbols will be removed from HIP library and applications using HIPRTC will be required to explicitly link to HIPRTC library. However, on Windows hiprtc.dll must be used as the hipamd64.dll doesn't contain the HIPRTC symbols. -* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with `__hip__ `, e.g. `__hip_uint32_t `. Applications previously using std::uint32_t or similar types can use `__hip_ ` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to `__hip_internal ` namespace as implementation details. +* Data types such as uint32_t, uint64_t, int32_t, int64_t defined in std namespace in HIPRTC are deprecated earlier and are being removed from ROCm release 6.1 since these can conflict with the standard C++ data types. These data types are now prefixed with `__hip__`, e.g. `__hip_uint32_t`. Applications previously using std::uint32_t or similar types can use `__hip_` prefixed types to avoid conflicts with standard std namespace or application can have their own definitions for these types. Also, type_traits templates previously defined in std namespace are moved to `__hip_internal` namespace as implementation details.