Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[XPU][TritonGPUToLLVM] Avoid bank conflicts in sub-group transposes
- Store the whole matrix using SIMD block stores for each row leaving a single garbage item at the end of the row so each row has `sub_group_size + 1` elements - Load each row with vector loads By introducing this garbage item at the end of each row, we ensure matrix loading avoid bank conflicts as the offset between the position loaded by work-item `i` and `i+j` is `N * (sub_group_size + 1)` (assuming `sub_group_size` banks). Signed-off-by: victor-eds <[email protected]>
- Loading branch information