Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve GEMM performance of shape 4096x8x128x16384 (#2646)
This change (`grid` order adjustment to improve cache hit) originating from #2600. Batched gemm only. ~99% of XeTLA for `4096x8x128x16384`. ![image](https://github.com/user-attachments/assets/ef7e9750-b3f7-4adc-aa66-5be704383e40)
- Loading branch information