Make `acc` matrix allocation on each call for XeTLA GEMM benchmarks #3026

anmyachev · 2024-12-17T16:40:42Z

If we take for comparison: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12382880184/job/34564504020 (main) vs https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12390456716/job/34585505155 (PR), then the degradation from this pull request for XeTLA is ~3%.

However, this is also a potential opportunity to improve the Triton kernel by only allocating the accumulation matrix once. If this is implemented for Triton, this pull request will need to be rolled back for XeTLA.

CI runs:

https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12378323790 (upstream profiler CI)
~~https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12379634873~~
~~https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12381457034~~
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12390411560 (upstream profiler) Wall time is used instead of elapsed_time (apparently chose the wrong runner)
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12390456716 (legacy profiler)

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev · 2024-12-18T12:09:23Z

Manual start, because for some reason the automatic one doesn't work: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/12392619525 (passed)

anmyachev and others added 5 commits December 17, 2024 16:39

Check influence of 'acc' matrix allocation on benchmarks results

4aa14d0

Signed-off-by: Anatoly Myachev <[email protected]>

Merge branch 'main' into amyachev/debug-benchmarks

2e4b6ad

Merge branch 'main' into amyachev/debug-benchmarks

07cbbd5

Merge branch 'main' into amyachev/debug-benchmarks

55863d7

cleanup

996a1e9

Signed-off-by: Anatoly Myachev <[email protected]>

anmyachev changed the title ~~Check influence of acc matrix allocation on GEMM benchmarks results~~ Make acc matrix allocation on each call for XeTLA GEMM benchmarks Dec 18, 2024

anmyachev linked an issue Dec 18, 2024 that may be closed by this pull request

Consider make acc matrix allocation on each call for XeTLA GEMM benchmarks #3038

Closed

anmyachev marked this pull request as ready for review December 18, 2024 12:16

anmyachev requested review from whitneywhtsang and ESI-SYD December 18, 2024 12:16

whitneywhtsang approved these changes Dec 18, 2024

View reviewed changes

whitneywhtsang added a commit that referenced this pull request Dec 18, 2024

Cherry-pick from #3026

d82e3ea

anmyachev merged commit 7eb41bf into main Dec 18, 2024
7 checks passed

anmyachev deleted the amyachev/debug-benchmarks branch December 18, 2024 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `acc` matrix allocation on each call for XeTLA GEMM benchmarks #3026

Make `acc` matrix allocation on each call for XeTLA GEMM benchmarks #3026

anmyachev commented Dec 17, 2024 •

edited

Loading

anmyachev commented Dec 18, 2024 •

edited

Loading

Make acc matrix allocation on each call for XeTLA GEMM benchmarks #3026

Make acc matrix allocation on each call for XeTLA GEMM benchmarks #3026

Conversation

anmyachev commented Dec 17, 2024 • edited Loading

anmyachev commented Dec 18, 2024 • edited Loading

Make `acc` matrix allocation on each call for XeTLA GEMM benchmarks #3026

Make `acc` matrix allocation on each call for XeTLA GEMM benchmarks #3026

anmyachev commented Dec 17, 2024 •

edited

Loading

anmyachev commented Dec 18, 2024 •

edited

Loading