-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TritonGEN] Use OCL builtins for subgroup block read/write #2178
Conversation
4dd8442
to
46ff791
Compare
Signed-off-by: Whitney Tsang <[email protected]>
46ff791
to
412d102
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can emulate wider vectors with several calls and avoid relying on GenISA intrinsics:
%res = triton_gen.simdblockread %ptr : (!llvm.ptr) -> vector<32xi32>
is equivalent to:
%0 = triton_gen.simdblockread %ptr : (!llvm.ptr) -> vector<8xi32>
%ptr1 = llvm.getelementptr inbounds %ptr[16] : (!llvm.ptr) -> !llvm.ptr, vector<8xi32>
%1 = triton_gen.simdblockread %ptr1 : (!llvm.ptr) -> vector<8xi32>
%ptr2 = llvm.getelementptr inbounds %ptr1[16] : (!llvm.ptr) -> !llvm.ptr, vector<8xi32>
%2 = triton_gen.simdblockread %ptr2 : (!llvm.ptr) -> vector<8xi32>
%ptr3 = llvm.getelementptr inbounds %ptr2[16] : (!llvm.ptr) -> !llvm.ptr, vector<8xi32>
%3 = triton_gen.simdblockread %ptr3 : (!llvm.ptr) -> vector<8xi32>
%res = // Vector concatenation %0 %1 %2 %3
Right, I have a local change for that, would like to do that in a separate PR. |
Signed-off-by: Whitney Tsang <[email protected]>
How about its performance? |
I would like to test its performance, but the SLM path is not working at the moment, so I cannot get a baseline. |
Signed-off-by: Whitney Tsang <[email protected]>
anywhere to document what the current ocl built-ins we can use in triton? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small NIT. LGTM.
We can use any ocl built-ins documented in any OpenCL C extensions, if we want to use an OpenCL C builtin that is not available in any OpenCL C extensions, then we need to send a request to IGC team. |
Signed-off-by: Whitney Tsang <[email protected]>
Use
intel_sub_group_block_[read|write]
defined in https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups.html, https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_char.html, https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_short.html, https://registry.khronos.org/OpenCL/extensions/intel/cl_intel_subgroups_long.html, and https://github.com/KhronosGroup/OpenCL-Docs/blob/main/extensions/cl_intel_subgroup_local_block_io.asciidoc.