Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix: Fix fuzz testcase for cast string to integer #450

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,9 @@ object CometCast {
timeZoneId: Option[String],
evalMode: String): SupportLevel = {
toType match {
case DataTypes.BooleanType =>
case DataTypes.BooleanType | DataTypes.IntegerType =>
Compatible()
case DataTypes.ByteType | DataTypes.ShortType | DataTypes.IntegerType |
DataTypes.LongType =>
case DataTypes.ByteType | DataTypes.ShortType | DataTypes.LongType =>
Incompatible(Some("Not all invalid inputs are detected"))
case DataTypes.BinaryType =>
Compatible()
Expand Down
11 changes: 8 additions & 3 deletions spark/src/test/scala/org/apache/comet/CometCastSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import java.io.File

import scala.util.Random

import org.apache.spark.sql.{CometTestBase, DataFrame, SaveMode}
import org.apache.spark.sql.{functions, CometTestBase, DataFrame, SaveMode}
import org.apache.spark.sql.catalyst.expressions.Cast
import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanHelper
import org.apache.spark.sql.functions.col
Expand Down Expand Up @@ -533,11 +533,16 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper {
castTest(gen.generateStrings(dataSize, numericPattern, 5).toDF("a"), DataTypes.ShortType)
}

ignore("cast StringType to IntegerType") {
test("cast StringType to IntegerType") {
// test with hand-picked values
castTest(castStringToIntegralInputs.toDF("a"), DataTypes.IntegerType)
// fuzz test
castTest(gen.generateStrings(dataSize, numericPattern, 8).toDF("a"), DataTypes.IntegerType)
castTest(
gen
.generateStrings(dataSize, numericPattern, 8)
.toDF("a")
.withColumn("a", functions.trim($"a")),
DataTypes.IntegerType)
Comment on lines +540 to +545
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to see us fix the compatibility issue rather than skip testing strings that have leading and/or trailing whitespace. I have been looking into this and was going to make some comments here but it turned out to be a bit more involved that I thought it would, so will create a PR soon with my proposed fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is that we are trimming the input before processing, so error messages use the trimmed input instead of the original input. Once I resolved that I saw that we had extra processing for leading and trailing whitespace that is never used (ported from Spark) so I ended up removing that for some performance wins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my PR: #453

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}

ignore("cast StringType to LongType") {
Expand Down