-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug fix: Fix fuzz testcase for cast string to integer #450
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending CI
The title should start with fix:
or test:
, I think
castTest( | ||
gen | ||
.generateStrings(dataSize, numericPattern, 8) | ||
.toDF("a") | ||
.withColumn("a", functions.trim($"a")), | ||
DataTypes.IntegerType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to see us fix the compatibility issue rather than skip testing strings that have leading and/or trailing whitespace. I have been looking into this and was going to make some comments here but it turned out to be a bit more involved that I thought it would, so will create a PR soon with my proposed fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue is that we are trimming the input before processing, so error messages use the trimmed input instead of the original input. Once I resolved that I saw that we had extra processing for leading and trailing whitespace that is never used (ported from Spark) so I ended up removing that for some performance wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my PR: #453
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Which issue does this PR close?
Closes #431 .
Rationale for this change
Removing leading whitespaces: In some inputs, Spark's error messages contain leading whitespaces while Comet's do not, causing the tests to fail. By removing the leading whitespaces from the input, this issue is resolved.
What changes are included in this PR?
removing the leading whitespaces from input df.
How are these changes tested?
Fuzz tests pass.