Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow passing read_only, data_only and keep_links arguments to openpyxl using engine_kwargs #55807

Merged
merged 4 commits into from
Nov 9, 2023

Conversation

dweindl
Copy link
Contributor

@dweindl dweindl commented Nov 3, 2023

Previously it was not possible to override the default values for openpyxl.reader.excel.load_workbook's read_only, data_only and keep_links arguments via pandas.read_excel's engine_kwargs (see #55027).

Now these options can be changed via engine_kwargs.

@dweindl dweindl requested a review from rhshadrach as a code owner November 3, 2023 07:12
@mroeschke mroeschke added the IO Excel read_excel, to_excel label Nov 3, 2023
@mroeschke mroeschke added this to the 2.2 milestone Nov 3, 2023
# GH 55027
# test that `read_only` and `data_only` can be passed to
# `openpyxl.reader.excel.load_workbook` via `engine_kwargs`
from pandas.io.excel._openpyxl import OpenpyxlReader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you import this at the top of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done.

…s to openpyxl using `engine_kwargs`

Previously it was not possible to override the default values for
`openpyxl.reader.excel.load_workbook`'s `read_only`, `data_only` and `keep_links`
arguments (see pandas-dev#55027).

Now these options can be changed via `engine_kwargs`.

Closes pandas-dev#55027
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a solid improvement over the current implementation, but it is still odd in relation to the other readers. I believe openpyxl is the only reader where we pass non-default values to the load function of the engine. Still, the values we pass seem like they would be the preferred values in most cases to me. Perhaps it's sufficient to document this oddity?

pandas/tests/io/excel/test_openpyxl.py Show resolved Hide resolved
@dweindl
Copy link
Contributor Author

dweindl commented Nov 4, 2023

I think this is a solid improvement over the current implementation, but it is still odd in relation to the other readers. I believe openpyxl is the only reader where we pass non-default values to the load function of the engine. Still, the values we pass seem like they would be the preferred values in most cases to me. Perhaps it's sufficient to document this oddity?

Thanks for your feedback. I don't find it particularly odd that pandas passes non-default arguments to openpyxl. I think these tools just have somewhat different use cases in mind. Openpyxl seems to focus on modifying excel files, so it makes sense that by default they open files with read_only=False, data_only=False, keep_links=True, where the latter two are required to not break formulas and links when the modified file is saved. For pandas.io.excel._openpyxl.OpenpyxlReader.load_workbook it is only about reading, so choosing the supposedly faster read_only=True seems plausible (although read_only=False seems slightly more robust, e.g. #39001 but also other cases where the results differ). Also external links are irrelevant, and most users would probably expect to get the results of any formulas (data_only=True) rather than the formulas themselves (although with the proposed changes they can get them if they wanted to).

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; @mroeschke - you good here?

@mroeschke mroeschke merged commit 0d786bf into pandas-dev:main Nov 9, 2023
34 checks passed
@mroeschke
Copy link
Member

Thanks @dweindl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
3 participants