-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add dictionary binary to shuffle writer #111
Conversation
bc34bb5
to
7d942b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally lgtm, except some minor comments, which are optional.
DataType::Int8 => string_dict_builder_inner_helper!( | ||
Int16Type, | ||
DataType::Int8 => byte_dict_builder_inner_helper!( | ||
Int8Type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug fixes? It looks like we may lack some unit test coverage in the rust side.
Maybe we should add more unit tests in the rust side in the follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it is a typo. More test coverage is good. As there are too many combinations, we probably miss some in current test coverage.
@@ -292,6 +292,12 @@ pub fn create_hashes<'a>( | |||
DataType::LargeUtf8 => { | |||
hash_array!(LargeStringArray, col, hashes_buffer); | |||
} | |||
DataType::Binary => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, this is for equal/hash contract for Spark side's sort operations?
I noticed some error when testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember spark_compatible_murmur3_hash
is implemented following Spark shuffle's hash function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged. Thanks. |
Which issue does this PR close?
Closes #114.
Rationale for this change
Native shuffle writer can write dictionary of string but dictionary of binary is not supported. We should add it.
What changes are included in this PR?
Adding the support of dictionary of binary to native shuffle writer.
How are these changes tested?