Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Fix wrong order by removal from plan #13497

Merged
merged 7 commits into from
Nov 24, 2024

Conversation

akurmustafa
Copy link
Contributor

Which issue does this PR close?

Closes 13483.

Rationale for this change

In the 13483, final plan doesn't contain SortExec where it should. I have traced the reason and recognized that during constantness evaluation, when all children of an expression is contant expression is treated as constant. This is a valid assumption for BinaryExpr, or CastExpr. However, this shouldn't be the case for CaseExpr. This PR fixes this problem.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Nov 20, 2024
Comment on lines 1211 to 1213
// When expression contains branch even if all children are constant
// final result may not be constant
let is_branched = expr.as_any().is::<CaseExpr>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get it. from constantness perspective CaseExpr is nothing special
in the issue example we have

  CASE 
        WHEN name = 'name1' THEN 0.0
        WHEN name = 'name2' THEN 0.5
    END AS a

the name = 'name1' is one of the child of CaseExpr and is should not be considered constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you comment, I thought about this. It turns out that my initial approach is indeed wrong. Datafusion has 2 kinds of constant expressions. One is across_partitions, other is per partition, indicate with across_partitions=false flag. In following plan

02)--ProjectionExec: expr=[CASE WHEN name@0 = name1 THEN 0 WHEN name@0 = name2 THEN 0.5 END as a]
03)----UnionExec
04)------ProjectionExec: expr=[name1 as name]
05)--------PlaceholderRowExec
06)------ProjectionExec: expr=[name2 as name]
07)--------PlaceholderRowExec

union has 2 children, each with constant name columns (since each source has single row, each column is constant). Union transforms these constant children as constant, but with across_partitions=false (which is correct). Each partition still receives constant value from their perspective (By the way Union doesn't merge any partition ). This is also correct for ProjectionExec. At ProjectionExec column name is constant, where across_partitions=false. However, in the current code, it is propagated as across_partitions=true. This behaviour is the root cause of the issue. I changed my implementation to fix this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation!

@alamb alamb mentioned this pull request Nov 20, 2024
&self,
expr: &Arc<dyn PhysicalExpr>,
) -> bool {
// As an example, assume that we know columns `a` and `b` are constant.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this doc!
so the result is true if expression contains only constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, when constantness is correct across partitions.

UNION ALL
SELECT 'name2'
)
ORDER BY a DESC;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add a test case with query that would produce incorrect results before the fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have added a new test in this commit. Prior to this fix, we would get nondeterministic results.

Add a new test case
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have approved that this test covers the fix -- CoalescePartitionsExec appears before. I have also run

EXPLAIN SELECT
    CASE
        WHEN name = 'name1' THEN 0.0
        WHEN name = 'name2' THEN 0.5
    END AS a
FROM (
    SELECT name FROM (VALUES ('name1'), ('name1')) AS table1(name)
    UNION ALL
    SELECT name FROM (VALUES ('name2'), ('name2')) AS table2(name)
)
ORDER BY a DESC;

and

EXPLAIN SELECT
    CASE
        WHEN name = 'name1' THEN 0.0
        WHEN name = 'name2' THEN 0.5
        WHEN name = 'name3' THEN 1.0
        WHEN name = 'name4' THEN 1.5
    END AS a
FROM (
    SELECT name FROM (VALUES ('name1'), ('name2')) AS table1(name)
    UNION ALL
    SELECT name FROM (VALUES ('name3'), ('name4')) AS table2(name)
)
ORDER BY a DESC;

and result of those are valid. One thing that just could be noted VALUES ('name1'), ('name2') and VALUES ('name1'), ('name1') are both regarded as not constant and not ordered, while single elements of SELECT 'smth' considered as constant

@berkaysynnada berkaysynnada merged commit 5c78912 into apache:main Nov 24, 2024
25 checks passed
@akurmustafa
Copy link
Contributor Author

and result of those are valid. One thing that just could be noted VALUES ('name1'), ('name2') and VALUES ('name1'), ('name1') are both regarded as not constant and not ordered, while single elements of SELECT 'smth' considered as constant

I think, in the case of SELECT 'smth', 'smth' is constant across partitions. Hence, after CoalesceBatchesExec, we will still end up with constant columns. not sure though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Order by is ignored
4 participants