Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmarks][Upstream PyTorch 2.5] Triton and XeTLA softmax performance degrades in comparison with torch 2.1 / ipex 2.1 test proxies #2106

Closed
ESI-SYD opened this issue Sep 4, 2024 · 4 comments · Fixed by #2199, #2278, #2298 or #2300
Assignees
Labels

Comments

@ESI-SYD
Copy link
Contributor

ESI-SYD commented Sep 4, 2024

  1. Ratio of Triton/ XeTLA keep same except for attention caused by XeTLA attention absolute number degraded
  2. Both Triton and XeTLA softmax cases degraded, so Triton/ XeTLA not changed.

details: #1905 (comment)

@vlad-penkin vlad-penkin changed the title [Benchmarks] Degrade when deprecate ipex in benchmarks [Benchmarks][Upstream PyTorch 2.5] Triton and XeTLA softmax performance degrades in comparison with torch 2.1 / ipex 2.1 test proxies Sep 5, 2024
@vlad-penkin
Copy link
Contributor

vlad-penkin commented Sep 5, 2024

@ESI-SYD what is the root cause for this issue? can you pin point it to a particular torch operation?

@anmyachev to proceed further with analysis / triaging please create a minimal reproducer for the Triton kernel path.

@ESI-SYD
Copy link
Contributor Author

ESI-SYD commented Sep 6, 2024

@ESI-SYD what is the root cause for this issue? can you pin point it to a particular torch operation?

There are two main differences in benchmark time method change after applying the Draft

  1. No sync submitting. https://github.com/intel/intel-xpu-backend-for-triton/blob/llvm-target/python/triton/testing.py#L214

  2. Use the time stamp between two barriers which is not accurate. Previous detailed explanation by chengjun.

@anmyachev
Copy link
Contributor

#2149 (comment)

@vlad-penkin vlad-penkin linked a pull request Sep 11, 2024 that will close this issue
@vlad-penkin vlad-penkin reopened this Sep 11, 2024
@anmyachev
Copy link
Contributor

anmyachev commented Sep 16, 2024

  1. Ratio of Triton/ XeTLA keep same except for attention caused by XeTLA attention absolute number degraded

At the moment, the degradation of absolute numbers has been fixed. The geometric mean difference is ~2% (between #1 and #2), which can be considered within the margin of error, I believe.

  1. Both Triton and XeTLA softmax cases degraded, so Triton/ XeTLA not changed.

The new approach to measuring performance is less precise and is more influenced by the operations that are performed in the functions we benchmark, before and after the kernel is launched. This influence is stronger where the kernel execution time is very small. For example, for the first combinations of fused_softmax benchmark, the kernel time takes no more than a hundredth of a millisecond (case when N=256), but if we look at the last combination (case when N=32768), the time is the same in both cases.

To sum up, for large dimensions the new benchmarking method is suitable and tells us that with upstream pytorch there is no degradation, however for small dimensions it cannot be used with reliability and we have to wait for a working solution kineto + intel gpu pti.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment