Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Ansi mode in abs function #500

Merged
merged 28 commits into from
Jun 11, 2024

Conversation

planga82
Copy link
Contributor

@planga82 planga82 commented Jun 1, 2024

Which issue does this PR close?

Closes #464 .

Rationale for this change

This PR adds support for ansi mode in the abs function. This is done by adding a wrapper to the abs datafusion function to add the different behavior between Spark and Datafusion. The main differences are the overflow behavior in legacy mode and the message exception in ansi mode.

What changes are included in this PR?

In addition to introducing the wrapper, some minor refactoring is done to move some code to a more general location.
In Spark, the abs function does not support try execution mode, only ansi or legacy.

How are these changes tested?

The new tests test the correct execution in the event of an overflow.

@@ -1474,15 +1481,14 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim
None
}

case Abs(child, _) =>
case Abs(child, failOnErr) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are already using failOnErr, can we simply use this boolean instead of evalmode struct? what are you thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your review! The intention why it is done this way is that in the Rust code the ansi mode is always treated the same way whether the expression supports the three ansi modes or only two. If not, we have to do a different treatment for both cases.

@@ -850,6 +850,34 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
}
}

test("abs Overflow ansi mode") {
val data: Seq[(Int, Int)] = Seq((Int.MaxValue, Int.MinValue))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have tests for all numerical values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 34.08%. Comparing base (7ba5693) to head (dc3f2a8).
Report is 17 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #500      +/-   ##
============================================
+ Coverage     34.05%   34.08%   +0.02%     
+ Complexity      859      812      -47     
============================================
  Files           116      105      -11     
  Lines         38679    38516     -163     
  Branches       8567     8555      -12     
============================================
- Hits          13173    13127      -46     
+ Misses        22745    22644     -101     
+ Partials       2761     2745      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

match self.inner_abs_func.invoke(args) {
Ok(result) => Ok(result),
Err(DataFusionError::ArrowError(ArrowError::ComputeError(msg), trace))
if msg.contains("overflow") =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if Arrow/DataFusion threw a specific overflow error so that we didn't have to look for a string within the error message, but I guess that isn't available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to file a feature request in DataFusion. I will post the link here later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I understand that we have to wait for the next version of arror-rs to be released and integrate it here to be able to make the changes, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can continue with this and do a follow up once the arrow-rs change is available. Or you can wait; the arrow-rs community is very responsive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to do it in a follow-up PR, I think. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help to log an issue to keep track

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently my PR in Arrow will not be available for 3 months because it is an API change 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we upgrade to the version of the arrow with the improved overflow eror reporing then the tests in this PR will fail (because we are looking for ComputeError but instead will get ArithmeticOverflow) so I don't think we need to file an issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parthchandra fyi ☝️


fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {
match self.inner_abs_func.invoke(args) {
Ok(result) => Ok(result),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically there is no need to match on an Ok result because there is already the catch all other handling.

Suggested change
Ok(result) => Ok(result),

Comment on lines 29 to 33
fn arithmetic_overflow_error(from_type: &str) -> CometError {
CometError::ArithmeticOverflow {
from_type: from_type.to_string(),
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to duplicate the same function from negative.rs. Perhaps that one could be moved so that it can be reused here.

Comment on lines 497 to 505
let eval_mode = match spark_expression::EvalMode::try_from(expr.eval_mode)? {
spark_expression::EvalMode::Legacy => EvalMode::Legacy,
spark_expression::EvalMode::Ansi => EvalMode::Ansi,
spark_expression::EvalMode::Try => {
return Err(ExecutionError::GeneralError(
"Invalid EvalMode: \"TRY\"".to_string(),
))
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we have this code block duplicated for abs and cast now. Perhaps this could be extracted into a function. Another option would be to implement TryFrom or TryInto for this conversion.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @planga82. This is looking good. I left some feedback on reducing code duplication that would be good to fix before merging.

@planga82
Copy link
Contributor Author

planga82 commented Jun 5, 2024

I have done a refactor with all the comments, thanks for the revision!

@planga82
Copy link
Contributor Author

planga82 commented Jun 6, 2024

The problems in the tests seem to be because the branch was not aligned with main. By aligning it, the tests have passed in my repo.

spark_expression::EvalMode::Try => EvalMode::Try,
spark_expression::EvalMode::Ansi => EvalMode::Ansi,
};
let eval_mode = EvalMode::try_from(expr.eval_mode)?;
Copy link
Member

@andygrove andygrove Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more idiomatic to use try_into here.

Suggested change
let eval_mode = EvalMode::try_from(expr.eval_mode)?;
let eval_mode = expr.eval_mode.try_into()?;

@@ -499,7 +497,12 @@ impl PhysicalPlanner {
let child = self.create_expr(expr.child.as_ref().unwrap(), input_schema.clone())?;
let return_type = child.data_type(&input_schema)?;
let args = vec![child];
let scalar_def = ScalarFunctionDefinition::UDF(math::abs());
let eval_mode = EvalMode::try_from(expr.eval_mode)?;
Copy link
Member

@andygrove andygrove Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more idiomatic to use try_into here.

Suggested change
let eval_mode = EvalMode::try_from(expr.eval_mode)?;
let eval_mode = expr.eval_mode.try_into()?;

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @planga82. I left one more nit, but will be happy to merge after that unless @parthchandra has more feedback

@planga82
Copy link
Contributor Author

planga82 commented Jun 8, 2024

Thanks!! I am a rust beginner, I appreciate any comments!

@andygrove andygrove merged commit e07f24c into apache:main Jun 11, 2024
43 checks passed
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
* change proto msg

* QueryPlanSerde with eval mode

* Move eval mode

* Add abs in planner

* CometAbsFunc wrapper

* Add error management

* Add tests

* Add license

* spotless apply

* format

* Fix clippy

* error msg for all spark versions

* Fix benches

* Use enum to ansi mode

* Fix format

* Add more tests

* Format

* Refactor

* refactor

* fix merge

* fix merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: ABS should only overflow in ANSI mode
5 participants