-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add HashJoin support for BuildRight #437
Conversation
152df93
to
383ff05
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #437 +/- ##
============================================
+ Coverage 34.17% 34.30% +0.13%
- Complexity 850 855 +5
============================================
Files 116 116
Lines 38547 38653 +106
Branches 8523 8545 +22
============================================
+ Hits 13172 13259 +87
- Misses 22609 22647 +38
+ Partials 2766 2747 -19 ☔ View full report in Codecov by Sentry. |
join match { | ||
case b: BroadcastHashJoinExec if b.isNullAwareAntiJoin => | ||
// DataFusion HashJoin LeftAnti has bugs on null keys. | ||
withInfo(join, "DataFusion doesn't support null-aware anti join") | ||
return None | ||
case _ => // no-op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ee5952e
to
9e25a6a
Compare
This requires apache/datafusion#10702 to be in DataFusion new release. |
dev/diffs/3.4.2.diff
Outdated
@@ -352,7 +352,7 @@ index 7dec558f8df..840dda15033 100644 | |||
assert(exchanges.size == 2) | |||
} | |||
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala | |||
index f33432ddb6f..060f874ea72 100644 | |||
index f33432ddb6f..9cf7a9dd4e3 100644 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need the same change for 3.4.3.diff
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
d41a68e
to
d494b4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I am seeing a performance regression with this PR. I am guessing it is related to the DataFusion upgrade.
Hmm, it's weird. I've updated the plan stability results. However, |
I finally produced the query plan same as CI by following same commands of the CI pipeline. |
Merged. Thanks @kazuyukitanimura @andygrove |
* feat: Add HashJoin support for BuildRight * Enable test * Update plan stability * More * Update plan stability * Refine * Fix * Update diffs to fix Spark tests * Update diff * Update Spark 3.4.3 diff * Use BuildSide enum * Update diffs * Update plan stability for Spark 4.0 * Update q5a plan
Which issue does this PR close?
Closes #390.
Rationale for this change
What changes are included in this PR?
How are these changes tested?