Kernels read keys and values out of bounds #4

natevm · 2023-10-17T18:46:38Z

Hello,

I've recently discovered that if the key/value buffers used in the sort are not a multiple of PARALLELSORT_THREADGROUP_SIZE * 4, then the buffers are read out of bounds and undefined behavior can occur.

See these lines below:

FidelityFX-ParallelSort/ffx-parallelsort/FFX_ParallelSort.h

Lines 133 to 137 in 0c53994

    
           uint srcKeys[FFX_PARALLELSORT_ELEMENTS_PER_THREAD]; 
        
           srcKeys[0] = SrcBuffer[DataIndex]; 
        
           srcKeys[1] = SrcBuffer[DataIndex + FFX_PARALLELSORT_THREADGROUP_SIZE]; 
        
           srcKeys[2] = SrcBuffer[DataIndex + (FFX_PARALLELSORT_THREADGROUP_SIZE * 2)]; 
        
           srcKeys[3] = SrcBuffer[DataIndex + (FFX_PARALLELSORT_THREADGROUP_SIZE * 3)];

No bounds checks are done to SrcBuffer here, causing GPU instability when these are read out of bounds.

Also an issue here:

FidelityFX-ParallelSort/ffx-parallelsort/FFX_ParallelSort.h

Lines 348 to 360 in 0c53994

    
           			// Pre-load the key values in order to hide some of the read latency 
        
           			uint srcKeys[FFX_PARALLELSORT_ELEMENTS_PER_THREAD]; 
        
           			srcKeys[0] = SrcBuffer[DataIndex]; 
        
           			srcKeys[1] = SrcBuffer[DataIndex + FFX_PARALLELSORT_THREADGROUP_SIZE]; 
        
           			srcKeys[2] = SrcBuffer[DataIndex + (FFX_PARALLELSORT_THREADGROUP_SIZE * 2)]; 
        
           			srcKeys[3] = SrcBuffer[DataIndex + (FFX_PARALLELSORT_THREADGROUP_SIZE * 3)]; 
        
           #ifdef kRS_ValueCopy 
        
           			uint srcValues[FFX_PARALLELSORT_ELEMENTS_PER_THREAD]; 
        
           			srcValues[0] = SrcPayload[DataIndex]; 
        
           			srcValues[1] = SrcPayload[DataIndex + FFX_PARALLELSORT_THREADGROUP_SIZE]; 
        
           			srcValues[2] = SrcPayload[DataIndex + (FFX_PARALLELSORT_THREADGROUP_SIZE * 2)]; 
        
           			srcValues[3] = SrcPayload[DataIndex + (FFX_PARALLELSORT_THREADGROUP_SIZE * 3)];

Later on, the number of keys is checked, but by that point it's too late:

FidelityFX-ParallelSort/ffx-parallelsort/FFX_ParallelSort.h

Lines 369 to 371 in 0c53994

    
           				uint localKey = (DataIndex < CBuffer.NumKeys ? srcKeys[i] : 0xffffffff); 
        
           #ifdef kRS_ValueCopy 
        
           				uint localValue = (DataIndex < CBuffer.NumKeys ? srcValues[i] : 0);

I suspect the fix would be to just check the number of keys before pre-loading the key/value pairs.

Reproducing is simple enough, just run the sort on data that is less than PARALLELSORT_THREADGROUP_SIZE * 4 with GPU-assisted validation that checks out of bounds descriptor reads.

jlacroixAMD · 2023-10-17T23:01:40Z

Thank you for reporting this. I'll file a ticket internally and we'll get it fixed in the next release.

jlacroixAMD · 2023-10-19T02:42:28Z

As I just realized this was reported on the old Parallel Sort sample, a fix to address this will be pushed with the next version of the FidelityFX SDK (which is how we are pushing out most updates to our older features now - https://github.com/GPUOpen-LibrariesAndSDKs/FidelityFX-SDK).

Also, in order to keep the GPU code as fast as possible, the fix will likely be done as a check on the NumKeys value at CPU time with an error code returned in the data setup stage.

natevm · 2023-10-19T03:03:31Z

Fwiw this sort implementation has been very helpful.

Small feature request, it would be great to also have a separate dedicated prefix sum/scan and a parallel device selection / compaction. Both of these are used internally by the radix sorter, but would also be helpful standalone.

jlacroixAMD · 2023-10-19T03:26:26Z

I'll add it to the list of planned improvements to existing samples. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernels read keys and values out of bounds #4

Kernels read keys and values out of bounds #4

natevm commented Oct 17, 2023 •

edited

Loading

jlacroixAMD commented Oct 17, 2023

jlacroixAMD commented Oct 19, 2023

natevm commented Oct 19, 2023

jlacroixAMD commented Oct 19, 2023

Kernels read keys and values out of bounds #4

Kernels read keys and values out of bounds #4

Comments

natevm commented Oct 17, 2023 • edited Loading

jlacroixAMD commented Oct 17, 2023

jlacroixAMD commented Oct 19, 2023

natevm commented Oct 19, 2023

jlacroixAMD commented Oct 19, 2023

natevm commented Oct 17, 2023 •

edited

Loading