Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement bloom_filter_agg #987
feat: Implement bloom_filter_agg #987
Changes from 25 commits
e662814
20f6e67
1ec31a2
3965dc4
62e656c
33ef47d
2040c76
599a8f9
a2a8cf3
22aedd9
bf22902
4b7000c
88adc75
a21e0e3
5c5d0f9
cd107e3
4f06098
79f6468
57fe742
ec64e4c
7a81f35
013513e
3347923
5c82f24
c39ff1d
1ed99e3
d41a9d2
6d13890
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only supports Int64? Spark BloomFilterAggregate supports Byte, Short, Int, Long and String. If Comet BloomFilterAggregate only support Int64 for now. We need to fallback to Spark for other cases in
QueryPlanSerde
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was going off of their docs which say it only supports
Long
.In their implementation, however, if looks like they can cast the fixed width types directly to
Long
https://github.com/apache/spark/blob/b078c0d6e2adf7eb0ee7d4742a6c52864440226e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L238
and for strings their bloom filter implementation has a
putBinary
method that we don't currently support. The casts should be easy. I'll look at whatputBinary
on our bloom filter implementation will take.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what happened. 3.4 only supports
Long
, which was the Spark source I was working off of. 3.5 added support for other types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified it to only generate a native BloomFilterAgg if the child has LongType. I'll open an issue to support more types in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to avoid the copy, which may be too ugly , would be to store bloom filter data as an Option<>
So instead of
Something like
And then you could basically use
Option::take
to take the value and leave aNone
in its place