-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array #59756
BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array #59756
Conversation
result = result.copy() | ||
elif not copy and result is arr: | ||
already_copied = False | ||
if result is arr or np.may_share_memory(arr, result): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason for may_share_memory instead of shares_memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that is what was recommended by the numpy developer: #54654 (comment)
My understanding is that we have here either always a full copy or either exactly the same memory, so just checking the bounds should be sufficient, and there is less risk to this being a costly check (although I don't really know how this works under the hood, so not sure if there is any risk in practice)
base_copy.astype(any_string_dtype) | ||
tm.assert_series_equal(base, base_copy) | ||
|
||
|
||
@td.skip_if_no("pyarrow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this decorator now redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, thanks
@@ -120,14 +120,23 @@ def test_astype_string_copy_on_pickle_roundrip(): | |||
tm.assert_series_equal(base, base_copy) | |||
|
|||
|
|||
def test_astype_string_copy_on_pickle_roundrip(any_string_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i cant find it now, but i thought one of the various string-dtype-like fixtures included str
, which would let us de-duplicate this test with the previous one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string_dtype
fixture has that, but that fixture doesn't have all the other variants. After 3.0, the astype(str)
will be equivalent to astype(pd.StringDtype(na_value=np.nan))
and then the above test will be superfluous, but at the moment astype(str)
still does something specfic (do convert to string but end up in object dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a comment
small questions, otherwise LGTM |
Thanks @jorisvandenbossche |
…ring_array (pandas-dev#59756) (#2) * BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array * update Co-authored-by: Joris Van den Bossche <[email protected]>
…ring_array (pandas-dev#59756) * BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array * update
…ring_array (pandas-dev#59756) * BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array * update
…ring_array (#59756) * BUG (string dtype): fix inplace mutation with copy=False in ensure_string_array * update
Further follow-up on #54654, #54687, #57212, because we still had the issue when
copy=False
was passed (which happens when casting to any of the string dtypes).The added tests failed without the actual code change.
(smaller change extracted from #59685)
xref #54792