Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoW warning mode: warn in case of chained assignment #55522

Merged
merged 15 commits into from
Nov 28, 2023

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Oct 14, 2023

Follow-up PR on #55428 (only last commit is relevant here, will be reviewable once parent PR is merged), xref #56019

Adding warnings for the case of chained assignment, i.e. cases for which we will issue the ChainedAssignmentError warning in the future with CoW enabled. But since this is the main breaking change of CoW, important to have a clear warning as well in the warn mode, and it is enabled in the default mode as well.

xref #56019

@jorisvandenbossche jorisvandenbossche changed the title Cow warning chained CoW warning mode - warn in case of chained assignment Oct 20, 2023
@jorisvandenbossche jorisvandenbossche changed the title CoW warning mode - warn in case of chained assignment CoW warning mode: warn in case of chained assignment Nov 7, 2023
@phofl
Copy link
Member

phofl commented Nov 25, 2023

As for the other PR, I would warn in the default mode as well. This will need a similar system as for the other chained PRs that checks the cacher (that said, I found a bug in this mode that I am fixing now)

Comment on lines 4208 to 4217
# elif not PYPY and not using_copy_on_write():
elif not PYPY and warn_copy_on_write():
if sys.getrefcount(self) <= 3: # and (
# warn_copy_on_write()
# or (
# not warn_copy_on_write()
# and self._mgr.blocks[0].refs.has_reference()
# )
# ):
warnings.warn("ChainedAssignmentError", FutureWarning, stacklevel=2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ẁe don't seem to have tests for where this would warn in the default mode (if you set a column here, like df[slice]["col"] = .. that already nevers update the original df, and you already get a (somewhat incorrect?, as you are not actually "setting on a copy", you are just not setting inplace) SettingWithCopyWarning.

But to fabricate something that we should warn about in the default mode as well:

In [1]: df = DataFrame({"a": [1, 2, 3, 4, 5], "b": 1})

In [2]: df[0:3][0:2] = 10
<ipython-input-2-0c4324ecc90c>:1: SettingWithCopyWarning: 
...

This updates the original df.

So will have to update the above for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started on this locally, but given the many other open PRs that will give conficts, maybe I can leave finishing DataFrame.__setitem__ proper warnings for default mode for a follow-up PR.

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review November 28, 2023 08:11
@jorisvandenbossche jorisvandenbossche added this to the 2.2 milestone Nov 28, 2023
@@ -312,8 +312,9 @@ def test_constructor_dtype_nocast_view_2d_array(
df = DataFrame([[1, 2], [3, 4]], dtype="int64")
if not using_array_manager and not using_copy_on_write:
should_be_view = DataFrame(df.values, dtype=df[0].dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this warn? Not a blocking comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking because currently it is a view and changes the parent dataframe?
But the input here is a numpy array, so not sure if we know that / if we have a way to warn users about the change in behaviour (essentially copy=False becomes copy=True if the input is a 2D ndarray)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is no way of knowing where the NumPy array came from, we have no clue that It was the backing array of a DataFrame before that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that's something we will simply have to document in the list of breaking changes.

@phofl phofl merged commit 7012d6a into pandas-dev:main Nov 28, 2023
42 checks passed
@phofl
Copy link
Member

phofl commented Nov 28, 2023

thx @jorisvandenbossche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants