Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel last batch norm have bad performance #3001

Open
jianyizh opened this issue Dec 12, 2024 · 4 comments
Open

Channel last batch norm have bad performance #3001

jianyizh opened this issue Dec 12, 2024 · 4 comments

Comments

@jianyizh
Copy link
Contributor

Describe the issue

reproduce.zip
I ran pytorch_unet in torchbench with channel last format and find that some batch norm related triton kernels have very low performance compare to channel first.
fwd_13.py
0.915ms 0.079GB 86.64GB/s
fwd_19.py
0.773ms 0.039GB 50.75GB/s
bwd_2.py
1.688ms 0.236GB 139.87GB/s
bwd_19.py
2.197ms 0.118GB 53.58GB/s
bwd_49.py
1.111ms 0.078GB 70.49GB/s
bwd_52.py
2.951ms 0.157GB 53.18GB/s
bwd_55.py
2.288ms 0.315GB 137.62GB/s

Environment details

pytorch: 565a7942eee1ddc23067cdbae597443d0f2290a0
triton: 91b14bf
gpu: Device #0: Intel(R) Data Center GPU Max 1100
compiler: Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20240928)

@alexbaden
Copy link
Contributor

It would be better to try this on the release 3.2 branch with pytorch 2.6 nightly builds - quite a lot has changed from 3.1.0. If you are able to retry with https://github.com/intel/intel-xpu-backend-for-triton/tree/release/3.2.x I would be interested in seeing if there is any difference.

@alexbaden alexbaden self-assigned this Dec 16, 2024
@vlad-penkin vlad-penkin added this to the 4.6 [Performance] E2E milestone Dec 16, 2024
@jianyizh
Copy link
Contributor Author

reproduce_triton_3_2.zip
I generate these kernels using pytorch af190479c8c28b6af56b7092106b190ae221e72b and triton 3.2 b6c6468 I don't see much difference

@alexbaden
Copy link
Contributor

Thanks, I was able to reproduce your numbers locally and make some initial observations. I will provide another update after the winter holiday.

@jianyizh
Copy link
Contributor Author

@riverliuintel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants