-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Wrap TableScan
with Filter
in Join Unparsing
#13496
base: main
Are you sure you want to change the base?
Conversation
TableScan
with Filter in Join UnparsingTableScan
with Filter
in Join Unparsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n it's very nice 👍. This PR improves the push-down filter unparsing by putting the filter in the where clause.
I have verified this way can trigger the push-down optimization for different join in DataFusion.
-----------------inner join-------------------
###### predicate in filter ######
SQL: select o_orderkey from orders inner join customer on o_custkey = c_custkey where c_name = 'Customer#000000001'
Projection: orders.o_orderkey
Inner Join: orders.o_custkey = customer.c_custkey
TableScan: orders projection=[o_orderkey, o_custkey]
Projection: customer.c_custkey
Filter: customer.c_name = Utf8View("Customer#000000001")
TableScan: customer projection=[c_custkey, c_name], partial_filters=[customer.c_name = Utf8View("Customer#000000001")]
-----------------left join-------------------
###### predicate in filter ######
SQL: select o_orderkey from orders left join customer on o_custkey = c_custkey where c_name = 'Customer#000000001'
Projection: orders.o_orderkey
Inner Join: orders.o_custkey = customer.c_custkey
TableScan: orders projection=[o_orderkey, o_custkey]
Projection: customer.c_custkey
Filter: customer.c_name = Utf8View("Customer#000000001")
TableScan: customer projection=[c_custkey, c_name], partial_filters=[customer.c_name = Utf8View("Customer#000000001")]
-----------------right join-------------------
###### predicate in filter ######
SQL: select o_orderkey from orders right join customer on o_custkey = c_custkey where c_name = 'Customer#000000001'
Projection: orders.o_orderkey
Right Join: orders.o_custkey = customer.c_custkey
TableScan: orders projection=[o_orderkey, o_custkey]
Projection: customer.c_custkey
Filter: customer.c_name = Utf8View("Customer#000000001")
TableScan: customer projection=[c_custkey, c_name], partial_filters=[customer.c_name = Utf8View("Customer#000000001")]
-----------------full join-------------------
###### predicate in filter ######
SQL: select o_orderkey from orders full join customer on o_custkey = c_custkey where c_name = 'Customer#000000001'
Projection: orders.o_orderkey
Right Join: orders.o_custkey = customer.c_custkey
TableScan: orders projection=[o_orderkey, o_custkey]
Projection: customer.c_custkey
Filter: customer.c_name = Utf8View("Customer#000000001")
TableScan: customer projection=[c_custkey, c_name], partial_filters=[customer.c_name = Utf8View("Customer#000000001")]
@sgrebnov may want to take a look.
@@ -1153,7 +1153,7 @@ fn test_join_with_table_scan_filters() -> Result<()> { | |||
|
|||
let sql = plan_to_sql(&join_plan_multiple_filters)?; | |||
|
|||
let expected_sql = r#"SELECT * FROM left_table AS "left" JOIN right_table ON "left".id = right_table.id AND (("left".id > 5) AND (("left"."name" LIKE 'some_name' AND (right_table."name" = 'before_join_filter_val')) AND (age > 10))) WHERE ("left"."name" = 'after_join_filter_val')"#; | |||
let expected_sql = r#"SELECT * FROM left_table AS "left" JOIN right_table ON "left".id = right_table.id AND ("left".id > 5) WHERE ("left"."name" = 'after_join_filter_val') AND "left"."name" LIKE 'some_name' AND ((right_table."name" = 'before_join_filter_val') AND (right_table.age > 10))"#; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm under impression that this change will result in incorrect behavior we were trying to fix here:
#13132.
Filtering after join is not the same as filtering during/before join (I would expect filters should be in subquery during join/ not after). Let me double check this please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering after join is (semantically) the same as filtering before/during joins only for INNER JOIN
/ JOIN
You are correct that for LEFT/RIGHT/FULL OUTER JOIN
and SEMI/ANTI JOIN
the rules are more subtle
@jonathanc-n, @goldmedal - thank you, I've reviewed this change and it seems it brings back the following issue (there is additional context of why filtering added this way produces incorrect result) #13132 . I really like the change but can we improve this to see if we can wrap TableScan with Filter as a subquery when it is required Example query. Original query / LogicalPlan / Result
| | Projection: customer.c_custkey, count(orders.o_orderkey) |
| | Aggregate: groupBy=[[customer.c_custkey]], aggr=[[count(orders.o_orderkey)]] |
| | Left Join: Filter: customer.c_custkey = orders.o_custkey |
| | TableScan: customer projection=[c_custkey] |
| | TableScan: orders projection=[o_orderkey, o_custkey], full_filters=[orders.o_comment NOT LIKE Utf8("%special%requests%")] Result:
Existing unparser (main)
Result:
Proposed change
Result
If intent for filter to be moved it must be wrapped as subquery in this case:
Result
|
Which issue does this PR close?
Closes #13156 .
Rationale for this change
What changes are included in this PR?
Pushes down filter to tablescan instead of having it apply on the join in unparsing.
Are these changes tested?
Changed the previous test
Are there any user-facing changes?