Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: add pyarrow autogenerated prefix #55115

Merged
merged 18 commits into from
Sep 27, 2023

Conversation

hedeershowk
Copy link
Contributor

@hedeershowk hedeershowk commented Sep 13, 2023

Any suggestions where to add a test for this bug (and fix)?

edit: took a shot and put it in tests/io/parser/test_header.py

@mroeschke mroeschke added IO CSV read_csv, to_csv Arrow pyarrow functionality labels Sep 18, 2023
StringIO(data),
header=None,
usecols=[0, 1],
dtype="object",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test with string[pyarrow] here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure things. It works fine so long as I make the comparison dataframe string[pyarrow] too of course:

        StringIO(data),
        header=None,
        usecols=[0, 1],
        dtype="string[pyarrow]",
        dtype_backend="pyarrow",
        engine="pyarrow",
    )
    expected = DataFrame([
        ["a", "i"], ["b", "j"]], 
        dtype="string[pyarrow]"
    )
    tm.assert_frame_equal(result, expected)

I'll update the PR with the change.

hedeershowk and others added 3 commits September 19, 2023 22:36
… `DataFrame` to ignore passed arguments) (pandas-dev#55089)

* fixes pandas-dev#55009

* update documentation

* write documentation

* add test

* change formatting

* cite DataDrame directly in docs

Co-authored-by: Matthew Roeschke <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@mroeschke
Copy link
Member

Please review and address the failed pre-commit failure

@hedeershowk
Copy link
Contributor Author

Please review and address the failed pre-commit failure

wasn't seeing any issue in my local pre-commit for some reason

@mroeschke mroeschke added this to the 2.2 milestone Sep 27, 2023
@mroeschke mroeschke merged commit 824a273 into pandas-dev:main Sep 27, 2023
33 checks passed
@mroeschke
Copy link
Member

Thanks @hedeershowk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: usecols in pandas.read_csv has incorrect behavior when using pyarrow engine
3 participants