-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[do not review] experimental support for lz4 compression (not working) #1181
base: main
Are you sure you want to change the base?
Conversation
} | ||
CompressionCodec::Zstd(level) => { | ||
let encoder = zstd::Encoder::new(output, *level)?; | ||
let mut arrow_writer = StreamWriter::try_new(encoder, &batch.schema())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really familiar with the code, but shouldn't StreamWriter
and encoder be created only once per stream instead of per batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is something I have been thinking about as well. We have the cost of writing the schema for each batch currently, and the schema is guaranteed to be the same for each batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add some more context here, we buffer rows per partition until we reach the desired batch size and then need to serialize that batch to bytes that can be read as one block by CometBlockStoreShuffleReader
.
Here are my findings from hacking on this today. LZ4 provides two compression formats: Spark uses the Java library https://github.com/lz4/lz4-java and specifically uses
edit: Apache Commons provides |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?