-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompose large simdblockread
to smaller simdblockread
s
#2193
Conversation
Before this PR: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10805431155 |
c58c8b1
to
638de27
Compare
local data (SLM path):
local data(Advanced path):
SLM path should be about 10 TFLOPS less than advanced path.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default, we do not want enable first load to slm.
please get me informed for what happened in advanced path!!!! |
There is no plan to enable first load to slm by default, this a draft PR, two of the commits are used for testing, as I am changing the lowering for SLM path, so enabled it to ensure it works as expected in the meantime. |
Of course, this is a draft PR, not ready for review. |
638de27
to
39b31a2
Compare
28c7187
to
4bb57ac
Compare
9d91486
to
ab3a173
Compare
but you requested others for review... |
Mainly want to get early feedback from Victor, as the idea came from him. And Quintin, as the results of FA with SLM path didn't look right. Anyways, I will take extra caution to include you on advanced path changes, or remember to not request any reviewers for draft PR. |
Thanks. |
Baseline CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10821106789 |
Sounds good. Just to clarify, I meant the idea of this PR, i.e., decompose |
Signed-off-by: Whitney Tsang <[email protected]>
Signed-off-by: Whitney Tsang <[email protected]>
Signed-off-by: Whitney Tsang <[email protected]>
Signed-off-by: Whitney Tsang <[email protected]>
ab3a173
to
ea9edbd
Compare
Will do a similar change to |
@Dewei-Wang-sh looks like you asked for changes for this PR. Does it now look OK to you as well ? |
Signed-off-by: Whitney Tsang <[email protected]>
Signed-off-by: Whitney Tsang <[email protected]>
Done in #2227. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: Whitney Tsang <[email protected]>
Similar to #2193, but for `simdblockwrite`. Decompose `simdblockwrite` of vector size > 8 to a number of `simdblockwrite`s of vector size 8. e.g., `<64xi16>` `simdblockwrite` is not supported by OpenCL C builtins, decompose it to `8 x <8xi16>`. Restrict `TritonGEN::SIMDBlockWriteOp` to only accept vector types that are allowed by OpenCL C builtins. --------- Signed-off-by: Whitney Tsang <[email protected]>
<64xi16>
simdblockread
is not supported by OpenCL C builtins, decompose it to8 x <8xi16>
.Restrict
TritonGEN::SIMDBlockReadOp
to only accept vector types that are allowed by OpenCL C builtins.