Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix parsing of ODF time values with comments #55324

Closed
wants to merge 8 commits into from

Conversation

mzuzek
Copy link

@mzuzek mzuzek commented Sep 29, 2023

ODF office:time-value format is specified by p.3.2.6 of XML Schema (Part 2) as referenced in p.19.389 of OpenDocument Schema Part 3

Related to: #55045 (test files updates)

@mzuzek mzuzek requested a review from rhshadrach as a code owner September 29, 2023 13:31
@mzuzek mzuzek force-pushed the fix-odf-time-comments branch from 38ef2df to 2bb23e6 Compare September 29, 2023 14:31
pandas/io/excel/_odfreader.py Outdated Show resolved Hide resolved
pandas/io/excel/_odfreader.py Outdated Show resolved Hide resolved
raise ValueError(f"Failed to parse ODF time value: {value}")
h, m, s = parts.group(1, 2, 3)
# ignore date part from some representations as both pd.Timestamp
# and datetime.time restrict hour values to 0..23
Copy link
Contributor

@dimastbk dimastbk Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ODS supports timedelta ([hh]:mm:ss). times_1904.ods is broken, see #55045. Maybe fix the file and do something like?

if h > 23:
    return pd.Timedelta(...)
else:
    return cast(Scalar, datetime.time(...))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, timedelta seem to make more sense than timestamp - at least for ODF time-value, because durations can be larger than 24h and they can be negative, see duration.ods:
duration

It's my first PR here, so I thought I'd better keep it tight and clean - hoping we could leave timedelta for a follow-up discussion and new PR. For sure unit tests would need adjustments (they specifically require datetime.time timestamps) and I don't know how other spreadsheet formats correspond.

Copy link
Member

@rhshadrach rhshadrach Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in saying that the current behavior of pandas will read the top value in the screenshot above as having 50 hours, where as this change will now be 50 - 48 = 2 hours?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach Which commit/version of pandas yields 50 hours for you ?
I thought you'd get an error in _odfreader.py:217 similar to:

>>> pd.Timestamp('50:15:00') 
Traceback (most recent call last):
  File "parsing.pyx", line 681, in pandas._libs.tslibs.parsing.dateutil_parse
ValueError: hour must be in 0..23

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which commit/version of pandas yields 50 hours for you ?

I haven't run anything.

I thought you'd get an error in _odfreader.py:217 similar to:

I think you're saying that both main and this PR will raise on duration.ods, is that right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I only said that the current implementation does not support time value equal or larger than 24 hours in ODF files.

Could you please run the files you'd like to check and share any results that raise your concerns ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only said that the current implementation

Does current implementation mean main or this PR?

@mroeschke mroeschke added the IO Excel read_excel, to_excel label Oct 2, 2023
@mzuzek mzuzek force-pushed the fix-odf-time-comments branch from 2bb23e6 to 7106b2b Compare October 3, 2023 21:36
@mzuzek mzuzek force-pushed the fix-odf-time-comments branch from 7106b2b to 6ebe03a Compare October 4, 2023 09:48
@mzuzek mzuzek force-pushed the fix-odf-time-comments branch from 6b896e3 to a493bb7 Compare October 4, 2023 11:44
@mzuzek
Copy link
Author

mzuzek commented Oct 4, 2023

On a493bb7, I've refactored the helper function into _get_cell_time_value() - similar to existing _get_cell_string_value() for strings.
Pardon me not grasping the ODFReader class layout in the first place...

@mzuzek mzuzek force-pushed the fix-odf-time-comments branch from 8a32292 to 7c22815 Compare October 4, 2023 15:23
@mzuzek mzuzek force-pushed the fix-odf-time-comments branch from 7c22815 to ea23a2e Compare October 6, 2023 07:27
@mzuzek
Copy link
Author

mzuzek commented Oct 6, 2023

@rhshadrach @mroeschke @dimastbk

Would there be anything else I should do on this PR ?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me

@rhshadrach
Copy link
Member

mzuzek force-pushed the fix-odf-time-comments branch from 7c22815 to ea23a2e
yesterday

Just so you're aware, by force pushing you make it so that reviewers can no longer use the "Show changes since your last review" feature. Not a big deal at all here because the diff is so small.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions - see the above review comments.

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Nov 17, 2023
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Nov 27, 2023
@mzuzek
Copy link
Author

mzuzek commented Nov 28, 2023

@rhshadrach @mroeschke @dimastbk

Thank you for reviewing this PR and for all your comments.

I'm sorry it didn't work out:
I understand that after all review issues and requests have been properly addressed - as confirmed here -
this PR was blocked by discussions which haven't introduced any new problems or change requests (last one here).
I'll assume it's just the way things work here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants