-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Ensure "string[pyarrow]" type is preserved when calling extractall #55534
BUG: Ensure "string[pyarrow]" type is preserved when calling extractall #55534
Conversation
74d398c
to
e5ef14b
Compare
doc/source/whatsnew/v2.1.2.rst
Outdated
@@ -30,6 +30,7 @@ Bug fixes | |||
- Fixed bug in :meth:`Index.insert` raising when inserting ``None`` into :class:`Index` with ``dtype="string[pyarrow_numpy]"`` (:issue:`55365`) | |||
- Fixed bug in :meth:`Series.all` and :meth:`Series.any` not treating missing values correctly for ``dtype="string[pyarrow_numpy]"`` (:issue:`55367`) | |||
- Fixed bug in :meth:`Series.rank` for ``string[pyarrow_numpy]`` dtype (:issue:`55362`) | |||
- Fixed bug in :meth:`Series.str.extractall` for ``string[pyarrow]`` dtype being converted to object (:issue:`53846`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fixed bug in :meth:`Series.str.extractall` for ``string[pyarrow]`` dtype being converted to object (:issue:`53846`) | |
- Fixed bug in :meth:`Series.str.extractall` for :class:`ArrowDtype` being converted to object (:issue:`53846`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks could you add a test for this?
Thank you @mroeschke, added |
pandas/tests/strings/test_extract.py
Outdated
@@ -2,6 +2,7 @@ | |||
import re | |||
|
|||
import numpy as np | |||
import pyarrow as pa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to import this in test_extractall_preserves_dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke thanks, moved the import. I also added the requirement to the pypy
and numpydev
actions yaml files but some tests are still failing, so before making any additional changes, I thought I'd ask what else might need to be changed so ci can run. Looks like the meta.yaml
file needs it added as a requirement as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding pyarrow to the dependency files you can structure the test like
def test_whatever():
pa = pytest.importorskip("pyarrow")
series = ...
So the test will be skipped if pyarrow is not installed or you will have pyarrow accessible as pa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair enough 👍 Thank you. There's still one spec failing but it's also failing in main, it's due to a Numpy deprecation warning as far as I can tell.
pandas/tests/strings/test_extract.py
Outdated
@pytest.mark.parametrize( | ||
"data, expected_dtype", | ||
[ | ||
(Series(["abc", "ab"], dtype=ArrowDtype(pa.string())), "string[pyarrow]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just test this case since I think the other cases are tested
e2a0852
to
f035363
Compare
f035363
to
03ed4a1
Compare
Thanks @ABizzinotto |
…reserved when calling extractall
…e is preserved when calling extractall) (#55597) Backport PR #55534: BUG: Ensure "string[pyarrow]" type is preserved when calling extractall Co-authored-by: Amanda Bizzinotto <[email protected]>
When calling
extractall
on an arrow string series, although the operation itself works, the returned dataframe column is of dtypeobject
instead of preserving the arrow string type. This happens since the resulting dataframe was generated with the original array dtype only if it was an instance ofStringDType
. Expanded the conditional to also preserve the dtype onArrowDType
.doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.