Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Make native shuffle compression configurable and respect spark.shuffle.compress #1185

Merged
merged 10 commits into from
Dec 20, 2024

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Dec 19, 2024

Which issue does this PR close?

Part of #1123

Rationale for this change

  • Respect spark.shuffle.compress so that shuffle compression can be disabled (this is mainly to help with performance profiling)
  • Make compression level configurable
  • Make compression codec configurable in preparation for adding support for lz4

Compression Enabled

2024-12-19_09-10

Compression Disabled

2024-12-19_09-09

cargo bench

Note that this is just the code of encoding and compressing in memory. There is no disk I/O.

2024-12-19_09-41

What changes are included in this PR?

Add compression configuration to protobuf

How are these changes tested?

@andygrove andygrove marked this pull request as ready for review December 19, 2024 16:14
@andygrove
Copy link
Member Author

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments

docs/source/user-guide/tuning.md Show resolved Hide resolved
native/core/src/execution/shuffle/row.rs Show resolved Hide resolved
native/proto/src/proto/operator.proto Outdated Show resolved Hide resolved
@andygrove andygrove changed the title feat: Make shuffle compression configurable and respect spark.shuffle.compress feat: Make native shuffle compression configurable and respect spark.shuffle.compress Dec 19, 2024
@andygrove andygrove merged commit ea6d205 into apache:main Dec 20, 2024
74 checks passed
@andygrove andygrove deleted the configurable-compression branch December 20, 2024 18:11
dharanad pushed a commit to dharanad/datafusion-comet that referenced this pull request Jan 1, 2025
….shuffle.compress` (apache#1185)

* Make shuffle compression codec and level configurable

* remove lz4 references

* docs

* update comment

* clippy

* fix benches

* clippy

* clippy

* disable test for miri

* remove lz4 reference from proto
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants