-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU][TritonGPUToLLVM] Avoid bank conflicts in sub-group transposes #2769
[XPU][TritonGPUToLLVM] Avoid bank conflicts in sub-group transposes #2769
Conversation
In my mind, the bank number of SLM is 65 on PVC. I think we need to make sure the bank number of the SLM. |
- Store the whole matrix using SIMD block stores for each row leaving a single garbage item at the end of the row so each row has `sub_group_size + 1` elements - Load each row with vector loads By introducing this garbage item at the end of each row, we ensure matrix loading avoid bank conflicts as the offset between the position loaded by work-item `i` and `i+j` is `N * (sub_group_size + 1)` (assuming `sub_group_size` banks). Signed-off-by: victor-eds <[email protected]>
a1c5725
to
b6cd04c
Compare
PVC has 64 8B banks. This PR indeed helps avoid bank conflicts. It's also true we could make this more optimal, but I'd go with this for now and fine-tune in a followup PR. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any performance impact?
LGTM. |
Good impact on the |
@chengjunlu I created #2797 |
a single garbage item at the end of the row so each row has
sub_group_size + 1
elementsBy introducing this garbage item at the end of each row, we ensure matrix
loading avoid bank conflicts as the offset between the position loaded by
work-item
i
andi+j
isN * (sub_group_size + 1)
(assumingsub_group_size
banks).
Closes #2751