-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement ANSI support for UnaryMinus #471
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good @vaibhawvipul. I think all that is needed now are unit tests perhaps in CometExpressionSuite
to show that this is all working as intended.
I also left some minor feedback.
@andygrove we have a test case now which shows that we have parity. |
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending CI
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
I have made all the requested changes, to me it looks ready for CI. Please let me know. cc @kazuyukitanimura @parthchandra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I asked many things, but a few more comments
No problem, it greatly improved my contribution. I learnt a lot and I am happy. This exercise will ensure that my next PR won't have this much back-and-forth :) |
@kazuyukitanimura this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR went through enough iterations, so I intend to accept this PR.
My comments below can be updated by follow ups.
withTable("t_interval") { | ||
spark.sql("CREATE TABLE t_interval(a STRING) USING PARQUET") | ||
spark.sql("INSERT INTO t_interval VALUES ('INTERVAL 10000000000 YEAR')") | ||
withAnsiMode(enabled = true) { | ||
spark | ||
.sql("SELECT CAST(a AS INTERVAL) AS a FROM t_interval") | ||
.createOrReplaceTempView("t_interval_casted") | ||
checkOverflow("SELECT a, -a FROM t_interval_casted", "interval") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks this does not hit native code as CAST(a AS INTERVAL)
is not supported yet. So it will fall back to Spark and checkOverflow
is comparing Spark results on both sides.
Perhaps there is no good way of creating interval now...
withTable("t") { | ||
sql("create table t(a int) using parquet") | ||
sql("insert into t values (-2147483648)") | ||
withAnsiMode(enabled = true) { | ||
checkOverflow("select a, -a from t", "integer") | ||
} | ||
} | ||
|
||
withTable("t_float") { | ||
sql("create table t_float(a float) using parquet") | ||
sql("insert into t_float values (3.4128235E38)") | ||
withAnsiMode(enabled = true) { | ||
checkOverflow("select a, -a from t_float", "float") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this will not test scalar case unless we do something like
withSQLConf(
"spark.sql.optimizer.excludedRules" ->
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {
checkOverflow("select a, -(a) from t_float", "float")
I think the current test creates a single item array.
That said, unless that option is set, it is unlikely to hit the scalar scenario. It is ideal to test scalar cases because user jobs may use that option, but it can be a follow up fix.
arrow::datatypes::IntervalUnit::DayTime => check_overflow!( | ||
array, | ||
arrow::array::IntervalDayTimeArray, | ||
i64::MIN, | ||
"interval" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting to see that testing this fails because DataFusion neg_wrapping
breaks the i64
into two i32
.
Then I realized there is no good way of testing right now as mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @vaibhawvipul and thank you for the reviews @kazuyukitanimura and @parthchandra
Hi @vaibhawvipul , I don't know why, but it seems that a test included in this PR is failing in the main branch in spak 4.0 tests. Any thoughts on this? Thanks in advance! (- unary negative integer overflow test *** FAILED *** (739 milliseconds) |
Hmm.. I think there was another PR which got merged after mine, that is breaking the tests. |
Ops, looks like this is a merge conflict, I will fix it. Sorry for the inconvenience. |
@kazuyukitanimura @vaibhawvipul This is already fixed in main. I fixed it as part of #505 |
Ah thank you @andygrove cc @planga82 |
* checking for invalid inputs for unary minus * adding eval mode to expressions and proto message * extending evaluate function for negative expression * remove print statements * fix format errors * removing units * fix clippy errors * expect instead of unwrap, map_err instead of match and removing Float16 * adding test case for unary negative integer overflow * added a function to make the code more readable * adding comet sql ansi config * using withTempDir and checkSparkAnswerAndOperator * adding macros to improve code readability * using withParquetTable * adding scalar tests * adding more test cases and bug fix * using failonerror and removing eval_mode * bug fix * removing checks for float64 and monthdaynano * removing checks of float and monthday nano * adding checks while evalute bounds * IntervalDayTime splitting i64 and then checking * Adding interval test * fix ci errors
Which issue does this PR close?
Closes #465 .
Rationale for this change
Improves compatibility with spark
What changes are included in this PR?
ANSI support for UnaryMinus by adding input checks when ANSI mode enabled.
How are these changes tested?
returns the expected error
Caused by: org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] integer overflow. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.