-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Move shuffle block decompression and decoding to native code and add LZ4 & Snappy support #1192
base: main
Are you sure you want to change the base?
Conversation
spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1192 +/- ##
=============================================
+ Coverage 34.06% 56.86% +22.79%
- Complexity 925 933 +8
=============================================
Files 115 112 -3
Lines 43569 11004 -32565
Branches 9528 2122 -7406
=============================================
- Hits 14843 6257 -8586
+ Misses 25777 3637 -22140
+ Partials 2949 1110 -1839 ☔ View full report in Codecov by Sentry. |
577880a
to
8ce9bb5
Compare
The status is that this is working, but I am running into some executor OOMs when trying to run complete benchmarks. I will pick this up again after the holidays. |
@Dandandan you may be interested in the benchmark results |
b.iter(|| { | ||
buffer.clear(); | ||
let mut cursor = Cursor::new(&mut buffer); | ||
write_ipc_compressed(&batch, &mut cursor, &CompressionCodec::Zstd(1), &ipc_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think zstd should have faster negative levels as well (-4 or -5 might come close), would be interesting to see how it compares. Not sure if it is available in the rust bindings.
Added Snappy.
|
Hmm, the difference between ZSTD and LZ4 looks not significant? Does the benchmark of TPC-H shown in the description include native shuffle block decompression and decoding? Do we have the number of difference between current Comet (ZSTD) and this patch (ZSTD + native decompression and decoding)? |
Sure, here are fresh benchmarks, with a single run for each measurement.
I had hoped for a bigger win, but 340 compared to 364 is still a 7% improvement overall. |
I enabled the unsafe version of lz4_flex, and the time reduced from 340s to 336s. |
Which issue does this PR close?
Part of #1123
Closes #1178
Rationale for this change
We currently perform encoding + compression in native code and decoding + decompression in JVM code. There are some downsides to this approach:
What changes are included in this PR?
ZSTD
LZ4
Microbenchmarks
TPC-H
edit: Results update on 1/1/2025 after making compression configurable for columnar shuffle as well as native shuffle.
How are these changes tested?
Existing tests