Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Implement optimizations from OneSweep #5

Open
natevm opened this issue Nov 3, 2023 · 2 comments
Open

[Feature Request] Implement optimizations from OneSweep #5

natevm opened this issue Nov 3, 2023 · 2 comments

Comments

@natevm
Copy link

natevm commented Nov 3, 2023

Not sure this is the right place for feature requests, but I’m curious if anyone here has considered moving to the “OneSweep” sort used in CUB.

Link to arxiv paper below:
https://arxiv.org/abs/2206.01784

In theory, this would be much faster than the four way binned historgram approach used by FFX, since OneSweep sorts 8 bits at a time and with 2n global memory read/write operations over four binning iterations over 32 bit keys rather than then the 3n read/read/write operations in the four way radix over 8 iterations.

The method requires a forward progress guarantee, but iiuc RDNA supports this now (at least if I understand the RDNA white paper correctly)

@jlacroixAMD
Copy link

Hi,

Thank you for your feedback. I've made a note of this for something to explore for future developments within the SDK. Bare in mind that I wouldn't expect we'll able to look into this for a while, but it is interesting nonetheless. Thank you kindly for the suggestion.

@natevm
Copy link
Author

natevm commented Nov 6, 2023

No problem.

Fwiw, there appears to be an open source implementation here, though I haven’t tested yet. Might be a good reference: https://github.com/b0nes164/ShaderOneSweep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants