-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix parsing of ODF time values with comments #55324
Conversation
38ef2df
to
2bb23e6
Compare
pandas/io/excel/_odfreader.py
Outdated
raise ValueError(f"Failed to parse ODF time value: {value}") | ||
h, m, s = parts.group(1, 2, 3) | ||
# ignore date part from some representations as both pd.Timestamp | ||
# and datetime.time restrict hour values to 0..23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ODS supports timedelta ([hh]:mm:ss
). times_1904.ods
is broken, see #55045. Maybe fix the file and do something like?
if h > 23:
return pd.Timedelta(...)
else:
return cast(Scalar, datetime.time(...))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, timedelta
seem to make more sense than timestamp - at least for ODF time-value
, because durations can be larger than 24h and they can be negative, see duration.ods:
It's my first PR here, so I thought I'd better keep it tight and clean - hoping we could leave timedelta
for a follow-up discussion and new PR. For sure unit tests would need adjustments (they specifically require datetime.time
timestamps) and I don't know how other spreadsheet formats correspond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct in saying that the current behavior of pandas will read the top value in the screenshot above as having 50 hours, where as this change will now be 50 - 48 = 2
hours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhshadrach Which commit/version of pandas yields 50 hours for you ?
I thought you'd get an error in _odfreader.py:217 similar to:
>>> pd.Timestamp('50:15:00')
Traceback (most recent call last):
File "parsing.pyx", line 681, in pandas._libs.tslibs.parsing.dateutil_parse
ValueError: hour must be in 0..23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which commit/version of pandas yields 50 hours for you ?
I haven't run anything.
I thought you'd get an error in _odfreader.py:217 similar to:
I think you're saying that both main and this PR will raise on duration.ods
, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I only said that the current implementation does not support time value equal or larger than 24 hours in ODF files.
Could you please run the files you'd like to check and share any results that raise your concerns ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only said that the current implementation
Does current implementation mean main or this PR?
2bb23e6
to
7106b2b
Compare
7106b2b
to
6ebe03a
Compare
6b896e3
to
a493bb7
Compare
On a493bb7, I've refactored the helper function into |
8a32292
to
7c22815
Compare
Co-authored-by: Matthew Roeschke <[email protected]>
7c22815
to
ea23a2e
Compare
@rhshadrach @mroeschke @dimastbk Would there be anything else I should do on this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me
Just so you're aware, by force pushing you make it so that reviewers can no longer use the "Show changes since your last review" feature. Not a big deal at all here because the diff is so small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions - see the above review comments.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
@rhshadrach @mroeschke @dimastbk Thank you for reviewing this PR and for all your comments. I'm sorry it didn't work out: |
time-value
cellstest_1900.ods
andtest_1904.ods
fixtures, so thatio/excel/test_readers.py:test_reader_seconds()
would be failing without this fix. Also fixed missing microseconds there (see p.1 in BUG (test): bad file for testing ODFReader #55045).doc/source/whatsnew/v2.2.0.rst
file.Related to: #55045 (test files updates)