-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reland [TUTORIAL] persistent softmax kernel (#1495)
Reland 01c3e98, a263360, a5b32a8 and 8ffdec1. These commits introduce tuning for NVIDIA GPUs. Modify for better tuning for XPU devices: - Launch a number of programs to maximize occupancy in a single wave if that's higher than the number of rows and the minimum number of rows each program will process is 2 - Launch `n_rows` programs otherwise - Tune `num_warps` depending on `BLOCK_SIZE` aiming for 4 elements per work-item. - Drop `num_stages` argument as we don't use that for now Code calculating occupancy based on https://oneapi-src.github.io/oneAPI-samples/Tools/GPU-Occupancy-Calculator/ Closes #1099 --------- Signed-off-by: Victor Perez <[email protected]>
- Loading branch information
1 parent
db107db
commit 7358f79
Showing
1 changed file
with
86 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters