convert dwa encoder to use algorithm quantize #1948

kdt3rd · 2024-12-31T04:58:48Z

Convert the quantization from a massive table to a bit-fiddling algorithm which is more code but significantly faster (in some cases, almost 2x). This also eliminates the giant lookup tables from the binary, which reduces the compiled size of the core library by half on x86, with avx2 / f16c enabled anyway.

This will help with the dwa performance problems noted in #1915 where the compressor got much slower

Convert the quantization from a massive table to a bit-fiddling algorithm which is more code but significantly faster (in some cases, almost 2x). This also eliminates the giant lookup tables from the binary, which reduces the compiled size of the core library by half on x86, with avx2 / f16c enabled anyway. Signed-off-by: Kimball Thurston <[email protected]>

Signed-off-by: Kimball Thurston <[email protected]>

also propagate cross platform function fixes to test suite Signed-off-by: Kimball Thurston <[email protected]>

Signed-off-by: Kimball Thurston <[email protected]>

meshula · 2024-12-31T14:20:30Z

src/lib/OpenEXRCore/internal_dwa_decoder.h

@@ -103,12 +103,68 @@ static exr_result_t LossyDctDecoder_execute (
 // value in the buffer - with the index into zig zag
 // order data. If we return 0, we have DC only data.
 //
-static int LossyDctDecoder_unRleAc (
+//
+// This is assuminging that halfZigBlock is zero'ed


"assuminging"

meshula · 2024-12-31T14:21:13Z

src/lib/OpenEXRCore/internal_dwa_decoder.h

@@ -215,7 +271,7 @@ LossyDctDecoder_base_construct (
    d->_toLinear      = toLinear;
    d->_width         = width;
    d->_height        = height;
-    if (d->_toLinear == NULL) d->_toLinear = dwaCompressorNoOp;
+    //if (d->_toLinear == NULL) d->_toLinear = dwaCompressorNoOp;


is this commenting out intentional

ah, yeah, I missed fully removing that. I also removed the no-op table in favor of an outer loop conditional to dispatch to do nothing instead of always looking through an identity table.

meshula · 2024-12-31T14:26:16Z

This also eliminates the giant lookup tables from the binary, which reduces the compiled size of the core library by half on x86, with avx2 / f16c enabled anyway.

I think, based on the code that on non-intel the giant look up tables are eliminated as well, just checking because I could read this comment as the giant lookup tables still exist on not-intel. I'm asking though, because maybe some detail is hidden in platform switches not in the diff.

We still have the giant table for unit testing. Do we need any touch on the cmake script to govern the creation of the table according to whether testing is enabled?

meshula

I'm approving on the assumption that the elimination of the tables is cross platform, not intel-only.

Happy New Year!

kdt3rd · 2024-12-31T20:44:50Z

Happy New Year! Yes, sorry, bad wording: this is entirely a cross-platform / architecture change, just removing 1.3MB of binary tables. I mentioned avx2 / f16c only to indicate that is where the performance numbers I might quote are indicative of (my older core i9-9900k). I have not yet been able to test on newer / alternate hardware, but expect there to be a good boost for all systems with increasingly disparate memory speed vs. instruction execution. This is could be viewed as a step towards enabling a gpu version of this, or easier implementation in an embedded system, as examples, not that I have explicit plans to do any of that.

This is not fully branchless unfortunately, as I could not work out the math yet to reason about the error tolerance being full 32-bit float precision, but wanting to quantize a 16-bit float. This code replicates the results precisely for the allowed range of quantization levels (0-100) for DWAA/B. However, if we were to change the error tolerance to be to a half float value, we could greatly simplify the code (faster still) and I believe have it be fully deterministic and mostly branchless (the odd check for inf / nan).

But I didn't want to change the output values just yet...

kdt3rd · 2024-12-31T20:47:49Z

We still have the giant table for unit testing. Do we need any touch on the cmake script to govern the creation of the table according to whether testing is enabled?

AFAIK, we long ago removed rules to create these files, only leaving the code which did so as reference, which I have not changed....

Signed-off-by: Kimball Thurston <[email protected]>

kdt3rd added 7 commits December 31, 2024 17:51

windows does not have an x86intrin.h header

0eb5081

Signed-off-by: Kimball Thurston <[email protected]>

add better cross platform use of intrinsic fast functions

ba7ebed

Signed-off-by: Kimball Thurston <[email protected]>

fix typo in old hardware path

b80e953

Signed-off-by: Kimball Thurston <[email protected]>

remove copy paste bug

a6d0c52

Signed-off-by: Kimball Thurston <[email protected]>

use better define to identify msvc

30dc084

also propagate cross platform function fixes to test suite Signed-off-by: Kimball Thurston <[email protected]>

better to test for msvc in case clang/gcc used on windows

8e6e2c0

Signed-off-by: Kimball Thurston <[email protected]>

meshula reviewed Dec 31, 2024

View reviewed changes

meshula approved these changes Dec 31, 2024

View reviewed changes

remove no longer needed code

86f669a

Signed-off-by: Kimball Thurston <[email protected]>

kdt3rd merged commit 22cbb79 into AcademySoftwareFoundation:main Jan 1, 2025
36 checks passed

kdt3rd deleted the convert_dwa_table_to_code branch January 1, 2025 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert dwa encoder to use algorithm quantize #1948

convert dwa encoder to use algorithm quantize #1948

kdt3rd commented Dec 31, 2024

meshula Dec 31, 2024

meshula Dec 31, 2024

kdt3rd Dec 31, 2024

meshula commented Dec 31, 2024

meshula left a comment

kdt3rd commented Dec 31, 2024

kdt3rd commented Dec 31, 2024

convert dwa encoder to use algorithm quantize #1948

convert dwa encoder to use algorithm quantize #1948

Conversation

kdt3rd commented Dec 31, 2024

meshula Dec 31, 2024

Choose a reason for hiding this comment

meshula Dec 31, 2024

Choose a reason for hiding this comment

kdt3rd Dec 31, 2024

Choose a reason for hiding this comment

meshula commented Dec 31, 2024

meshula left a comment

Choose a reason for hiding this comment

kdt3rd commented Dec 31, 2024

kdt3rd commented Dec 31, 2024