Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IfExpr to check when input to log2 is <=0 and return null #506

Merged
merged 8 commits into from
Jul 15, 2024

Conversation

PedroMDuarte
Copy link
Contributor

@PedroMDuarte PedroMDuarte commented Jun 3, 2024

Which issue does this PR close?

Closes #485

Rationale for this change

Compatibility with how Spark handles logarithms of values <=0.

What changes are included in this PR?

Use IfExpr to check when input to log2 is <=0 and return null. This is done to match Spark's behavior, which in turn is implemented to match Hive's behavior.

How are these changes tested?

The existing test for ln, log2 and log10 was modified so that it includes negative numbers as part of the inputs being tested.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good so far. Thanks @PedroMDuarte

@@ -1397,7 +1397,18 @@ impl PhysicalPlanner {
args.is_empty(),
));

Ok(scalar_expr)
match fun_name.as_str() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, please don't put this special handling in create_scalar_function_expr. Could you move it to create_comet_physical_fun where special handling of scalar functions should be located?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing that up!

Just to clarify, if we were to do this as part of create_comet_physical_fun then it would be all reimplemented at a lower level and not able to reuse the Datafusion Log2 implementation via an IfExpr. Is that right?

Happy to go either way. @andygrove's original hint mentioned using IfExpr, but I couldn't quite get that working inside of create_comet_physical_fun because it doesn't have access to the arg exprs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay in responding here. I will start reviewing this on Monday.

@kazuyukitanimura
Copy link
Contributor

I would rather do this in QueryPlanSerde, E.g.

        case Log2(child) =>
          val childExpr = exprToProtoInternal(nullIfNegative(child), inputs)

def nullIfNegative(expression: Expression): {
    val zero = Literal.default(expression.dataType)
    If(LessThanOrEqual(expression, zero), Literal.create(null, expression.dataType), expression)
}

See nullIfWhenPrimitive for a similar case.

@PedroMDuarte
Copy link
Contributor Author

I would rather do this in QueryPlanSerde, E.g.

        case Log2(child) =>
          val childExpr = exprToProtoInternal(nullIfNegative(child), inputs)

def nullIfNegative(expression: Expression): {
    val zero = Literal.default(expression.dataType)
    If(LessThanOrEqual(expression, zero), Literal.create(null, expression.dataType), expression)
}

See nullIfWhenPrimitive for a similar case.

thanks for this suggestion!

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI. I confirmed that the change to the unit test does demonstrate the incorrect behavior. Thanks @PedroMDuarte

@andygrove
Copy link
Member

Could you also update this section in expressions.md

| Log        | log(0) will produce `-Infinity` unlike Spark which returns `null`   |
| Log2       | log2(0) will produce `-Infinity` unlike Spark which returns `null`  |
| Log10      | log10(0) will produce `-Infinity` unlike Spark which returns `null` |

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Jul 15, 2024

hmm, ci is failing... looks CI passed

@PedroMDuarte
Copy link
Contributor Author

thanks for the guidance here, really appreciate you all taking the time to show me how to contribute this fix.

@kazuyukitanimura kazuyukitanimura merged commit 6f9b56a into apache:main Jul 15, 2024
74 checks passed
@kazuyukitanimura
Copy link
Contributor

Merged, thanks @PedroMDuarte @andygrove

@kazuyukitanimura
Copy link
Contributor

And thanks @viirya

@PedroMDuarte PedroMDuarte deleted the pedromduarte/log2 branch July 15, 2024 23:32
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
…#506)

## Which issue does this PR close?

Closes apache#485 

## Rationale for this change

Compatibility with how Spark handles logarithms of values <=0. 

## What changes are included in this PR?

Use IfExpr to check when input to log2 is <=0 and return null.  This is done to match Spark's behavior, which in turn is implemented to match Hive's behavior.

## How are these changes tested?

The existing test for `ln`, `log2` and `log10` was modified so that it includes negative numbers as part of the inputs being tested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: log2 produces different values than Spark in some cases
5 participants