You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reproduce.zip
I ran pytorch_unet in torchbench with channel last format and find that some batch norm related triton kernels have very low performance compare to channel first.
fwd_13.py
0.915ms 0.079GB 86.64GB/s
fwd_19.py
0.773ms 0.039GB 50.75GB/s
bwd_2.py
1.688ms 0.236GB 139.87GB/s
bwd_19.py
2.197ms 0.118GB 53.58GB/s
bwd_49.py
1.111ms 0.078GB 70.49GB/s
bwd_52.py
2.951ms 0.157GB 53.18GB/s
bwd_55.py
2.288ms 0.315GB 137.62GB/s
Environment details
pytorch: 565a7942eee1ddc23067cdbae597443d0f2290a0
triton: 91b14bf
gpu: Device #0: Intel(R) Data Center GPU Max 1100
compiler: Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20240928)
The text was updated successfully, but these errors were encountered:
It would be better to try this on the release 3.2 branch with pytorch 2.6 nightly builds - quite a lot has changed from 3.1.0. If you are able to retry with https://github.com/intel/intel-xpu-backend-for-triton/tree/release/3.2.x I would be interested in seeing if there is any difference.
reproduce_triton_3_2.zip
I generate these kernels using pytorch af190479c8c28b6af56b7092106b190ae221e72b and triton 3.2 b6c6468 I don't see much difference
Describe the issue
reproduce.zip
I ran pytorch_unet in torchbench with channel last format and find that some batch norm related triton kernels have very low performance compare to channel first.
fwd_13.py
0.915ms 0.079GB 86.64GB/s
fwd_19.py
0.773ms 0.039GB 50.75GB/s
bwd_2.py
1.688ms 0.236GB 139.87GB/s
bwd_19.py
2.197ms 0.118GB 53.58GB/s
bwd_49.py
1.111ms 0.078GB 70.49GB/s
bwd_52.py
2.951ms 0.157GB 53.18GB/s
bwd_55.py
2.288ms 0.315GB 137.62GB/s
Environment details
pytorch: 565a7942eee1ddc23067cdbae597443d0f2290a0
triton: 91b14bf
gpu: Device #0: Intel(R) Data Center GPU Max 1100
compiler: Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20240928)
The text was updated successfully, but these errors were encountered: