-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix integer overflow in date_parser #529
Conversation
The date_parser was introduced in apache#383 and is mostly a direct port of code in Spark. Since the code uses the JVM it has defined integer overflow as wrapping. The proposed fixed is to use std::num::Wrapping to get the same wrapping behavior in rust. The overflown value will still be disgarded in a later check that uses `current_segment_digits` so allowing the overflow does not lead to correctness issues. This resolves one of the overflows discussed in apache#481
// Overflows i32 | ||
Some("-4607172990231812908"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -19,6 +19,7 @@ use std::{ | |||
any::Any, | |||
fmt::{Debug, Display, Formatter}, | |||
hash::{Hash, Hasher}, | |||
num::Wrapping, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of Wrapping
. TIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if this is also faster since there is no overflow check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we usually use some methods like wrapping_add
, etc.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #529 +/- ##
=============================================
+ Coverage 34.23% 54.68% +20.45%
+ Complexity 803 795 -8
=============================================
Files 105 102 -3
Lines 38483 9688 -28795
Branches 8560 1845 -6715
=============================================
- Hits 13173 5298 -7875
+ Misses 22555 3433 -19122
+ Partials 2755 957 -1798 ☔ View full report in Codecov by Sentry. |
@@ -19,6 +19,7 @@ use std::{ | |||
any::Any, | |||
fmt::{Debug, Display, Formatter}, | |||
hash::{Hash, Hasher}, | |||
num::Wrapping, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if this is also faster since there is no overflow check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @eejbyfeldt.
Thanks for the review @parthchandra
The date_parser was introduced in apache#383 and is mostly a direct port of code in Spark. Since the code uses the JVM it has defined integer overflow as wrapping. The proposed fixed is to use std::num::Wrapping to get the same wrapping behavior in rust. The overflown value will still be disgarded in a later check that uses `current_segment_digits` so allowing the overflow does not lead to correctness issues. This resolves one of the overflows discussed in apache#481
Which issue does this PR close?
This resolves one of the overflows discussed in #481
Rationale for this change
The date_parser was introduced in #383 and is mostly a direct port of code in Spark. In Spark the code uses Scala which has defined integer overflow as two's complement wrapping.
What changes are included in this PR?
The proposed fixed is to use
std::num::Wrapping
to get the same wrapping behavior in rust. The overflown value will still be rejected in a later check that usescurrent_segment_digits
so allowing the overflow does not lead to correctness issues.How are these changes tested?
New and existing unit tests.