-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: Fix wrong order by removal from plan #13497
[bug]: Fix wrong order by removal from plan #13497
Conversation
// When expression contains branch even if all children are constant | ||
// final result may not be constant | ||
let is_branched = expr.as_any().is::<CaseExpr>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get it. from constantness perspective CaseExpr is nothing special
in the issue example we have
CASE
WHEN name = 'name1' THEN 0.0
WHEN name = 'name2' THEN 0.5
END AS a
the name = 'name1'
is one of the child of CaseExpr and is should not be considered constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you comment, I thought about this. It turns out that my initial approach is indeed wrong. Datafusion
has 2 kinds of constant expressions. One is across_partitions
, other is per partition
, indicate with across_partitions=false
flag. In following plan
02)--ProjectionExec: expr=[CASE WHEN name@0 = name1 THEN 0 WHEN name@0 = name2 THEN 0.5 END as a]
03)----UnionExec
04)------ProjectionExec: expr=[name1 as name]
05)--------PlaceholderRowExec
06)------ProjectionExec: expr=[name2 as name]
07)--------PlaceholderRowExec
union has 2 children, each with constant name
columns (since each source has single row, each column is constant). Union
transforms these constant children as constant, but with across_partitions=false
(which is correct). Each partition still receives constant value from their perspective (By the way Union
doesn't merge any partition ). This is also correct for ProjectionExec
. At ProjectionExec
column name
is constant, where across_partitions=false
. However, in the current code, it is propagated as across_partitions=true
. This behaviour is the root cause of the issue. I changed my implementation to fix this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation!
&self, | ||
expr: &Arc<dyn PhysicalExpr>, | ||
) -> bool { | ||
// As an example, assume that we know columns `a` and `b` are constant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this doc!
so the result is true
if expression contains only constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, when constantness is correct across partitions.
UNION ALL | ||
SELECT 'name2' | ||
) | ||
ORDER BY a DESC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add a test case with query that would produce incorrect results before the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I have added a new test in this commit. Prior to this fix, we would get nondeterministic results.
Add a new test case
daa799f
to
fa9284a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have approved that this test covers the fix -- CoalescePartitionsExec
appears before. I have also run
EXPLAIN SELECT
CASE
WHEN name = 'name1' THEN 0.0
WHEN name = 'name2' THEN 0.5
END AS a
FROM (
SELECT name FROM (VALUES ('name1'), ('name1')) AS table1(name)
UNION ALL
SELECT name FROM (VALUES ('name2'), ('name2')) AS table2(name)
)
ORDER BY a DESC;
and
EXPLAIN SELECT
CASE
WHEN name = 'name1' THEN 0.0
WHEN name = 'name2' THEN 0.5
WHEN name = 'name3' THEN 1.0
WHEN name = 'name4' THEN 1.5
END AS a
FROM (
SELECT name FROM (VALUES ('name1'), ('name2')) AS table1(name)
UNION ALL
SELECT name FROM (VALUES ('name3'), ('name4')) AS table2(name)
)
ORDER BY a DESC;
and result of those are valid. One thing that just could be noted VALUES ('name1'), ('name2')
and VALUES ('name1'), ('name1')
are both regarded as not constant and not ordered, while single elements of SELECT 'smth'
considered as constant
I think, in the case of |
Which issue does this PR close?
Closes 13483.
Rationale for this change
In the 13483, final plan doesn't contain
SortExec
where it should. I have traced the reason and recognized that during constantness evaluation, when all children of an expression is contant expression is treated as constant. This is a valid assumption forBinaryExpr
, orCastExpr
. However, this shouldn't be the case forCaseExpr
. This PR fixes this problem.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?