[Feature Request] Implement optimizations from OneSweep #5

natevm · 2023-11-03T17:38:22Z

Not sure this is the right place for feature requests, but I’m curious if anyone here has considered moving to the “OneSweep” sort used in CUB.

Link to arxiv paper below:
https://arxiv.org/abs/2206.01784

In theory, this would be much faster than the four way binned historgram approach used by FFX, since OneSweep sorts 8 bits at a time and with 2n global memory read/write operations over four binning iterations over 32 bit keys rather than then the 3n read/read/write operations in the four way radix over 8 iterations.

The method requires a forward progress guarantee, but iiuc RDNA supports this now (at least if I understand the RDNA white paper correctly)

jlacroixAMD · 2023-11-06T05:35:59Z

Hi,

Thank you for your feedback. I've made a note of this for something to explore for future developments within the SDK. Bare in mind that I wouldn't expect we'll able to look into this for a while, but it is interesting nonetheless. Thank you kindly for the suggestion.

natevm · 2023-11-06T07:25:23Z

No problem.

Fwiw, there appears to be an open source implementation here, though I haven’t tested yet. Might be a good reference: https://github.com/b0nes164/ShaderOneSweep

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Implement optimizations from OneSweep #5

[Feature Request] Implement optimizations from OneSweep #5

natevm commented Nov 3, 2023 •

edited

Loading

jlacroixAMD commented Nov 6, 2023

natevm commented Nov 6, 2023

[Feature Request] Implement optimizations from OneSweep #5

[Feature Request] Implement optimizations from OneSweep #5

Comments

natevm commented Nov 3, 2023 • edited Loading

jlacroixAMD commented Nov 6, 2023

natevm commented Nov 6, 2023

natevm commented Nov 3, 2023 •

edited

Loading