Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoW: add warning mode for cases that will change behaviour #55428

Merged
merged 11 commits into from
Oct 30, 2023

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Oct 6, 2023

xref #56019, #48998

An initial PR working towards one of the remaining TODO items:

Add a warning mode that gives deprecation warnings for all cases where the current behaviour would change (initially also behind an option)

The current PR adds an option pd.options.mode.copy_on_write = "warn", and then everything behaves as the current default (without CoW enabled) but will raise a warning in case the behaviour will change if CoW would be enabled.

As a first step, this PR adds the option, and adds the warning in one specific code path that can trigger CoW, i.e. updating a Series (Manager.setitem_inplace). So this covers a typical case like:

s = df["col"]
s[0] = ..

no longer updating df.

In addition, I also turned off SettingWithCopyWarning when the CoW warning mode is enabled (because in that case, those warnings are no longer correct for the future behaviour, and it seemed better to not raise both kinds of warnings at the same time)

@jorisvandenbossche
Copy link
Member Author

There are still a whole bunch of failures (with the warning mode enabled, for which I still need to add a CI build), but quite some of them are due to inplace operators (eg s += 1) unnecessarily triggering CoW, which is something I will try to tackle as a precursor.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first part of the review (wifi connection on the train isn't stable)

pandas/core/frame.py Outdated Show resolved Hide resolved
pandas/core/internals/managers.py Outdated Show resolved Hide resolved
if cow_context == "chained-assignment":
warnings.warn(
"ChainedAssignmentError: behaviour will change in pandas 3.0 "
"with Copy-on-Write ...",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a doc page that we can link to that explains this more in-depth

warnings.warn(
_chained_assignment_msg, ChainedAssignmentError, stacklevel=2
)
elif warn_copy_on_write() and sys.getrefcount(self) <= 3:
cow_context = "chained-assignment"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to warn here instead of deferring. This might now warn, if the first op in chained assignment created a copy, e.g.

pd.options.mode.copy_on_write = "warn"

df = pd.DataFrame({"a": [1, 2, 3,], "b": 1})

indexer = df.a > 1
df = df.a
df[indexer][0] = 5

I think you could make a case for not warning here since the current behaviour is a no-op as well, but

  • the DataFrame case warns
  • I would prefer a warning since cow will also warn

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason that I didn't yet warn here and deferred it to Manager.setitem_inplace, is because I wanted the warning to be more specific, so we can distinguish the case of:

df = ..
ser = df["col"]
ser[0] = 0

-> here we want to warn that you are setting into a series that is a view of another, and that this change will no longer propagate (in this case no longer update the dataframe)

df = ...
df["col"][0] = 0

-> here we want to warn that this simply won't work anymore in the future, as this is chained assignment (and the case of it that currently does work and is used widely I think, so I want to have a distinct warning for this one)

Now, while writing this I am not sure anymore why this alone warrants deferring it. I have to check in the tests for the case that made me move this to the lower level (I originally was just warning here, as you suggest, but then changed it)

warn = SettingWithCopyWarning if single_block else None
# TODO
warn = (
SettingWithCopyWarning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, shouldn't this be silent when CoW is enabled as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and that using_copy_on_Write case is already handled in the if block above, so in this else I still need to handle the warn_copy_on_write case.

I could actually do a elif warn_copy_on_write separate block, that might be easier to read and which is also the pattern I started to use in many other places.

pandas/tests/copy_view/test_indexing.py Outdated Show resolved Hide resolved
pandas/tests/copy_view/test_indexing.py Outdated Show resolved Hide resolved
pandas/tests/indexing/multiindex/test_setitem.py Outdated Show resolved Hide resolved
pandas/tests/indexing/test_chaining_and_caching.py Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member Author

@phofl I removed the last commit about the chained assignments again, and moved that to a separate PR (#55522). So this PR then just focuses on adding the infrastructure, and the warnings for (non-chained) series setitem (the things going through setitem_inplace). Then once this is merged, I can work more easily in parallel on different areas.

@jorisvandenbossche jorisvandenbossche added this to the 2.2 milestone Oct 14, 2023
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review October 14, 2023 20:43
Comment on lines +102 to +113
COW_WARNING_SETITEM_MSG = """\
Setting a value on a view: behaviour will change in pandas 3.0.
Currently, the mutation will also have effect on the object that shares data
with this object. For example, when setting a value in a Series that was
extracted from a column of a DataFrame, that DataFrame will also be updated:

ser = df["col"]
ser[0] = 0 <--- in pandas 2, this also updates `df`

In pandas 3.0 (with Copy-on-Write), updating one Series/DataFrame will never
modify another, and thus in the example above, `df` will not be changed.
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question is: how elaborate do we want to make this warning message?

One the one hand, I think one can argue that it can be short and point to a doc page where it is explained in more detail (which I actually should add anyway).

On the other hand, I also want it to be understandable for the many people that won't read the docs about this in detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would anticipate users also wanting to remove/address this warning, so in light of that, I would opt to move the explanation to a doc page if we can include potential actions users can take to remove this warning (e.g. set the copy-on-write mode to True)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to leave this as is, for now, and do a separate PR to write the documentation and update the warnings (add links to the docs, potentially shorten it)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure thing 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the doc page

Comment on lines 909 to 911
if warn_copy_on_write:
with tm.assert_produces_warning(FutureWarning):
s[0] = 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important do we find it that we ensure the exact message? (given that this is only temporary, maybe not super important?)

(although I suppose it is not difficult to just add match="Setting a value on a view" to all cases I am adding in this PR.

I don't know how useful it will be to add a custom tm.assert_produces_cow_warning, but it would make the tm.assert_produces_warning(FutureWarning, match="Setting a value on a view"): a bit shorter.

But when adding warnings for more cases, the exact warning message might also differ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay being lax with matching the warning message during testing (but noting a comment that it's CoW related if it's not obvious from the test)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but noting a comment that it's CoW related if it's not obvious from the test

That might be a reason to actually make something like a tm.assert_produces_cow_warning, then it's also clear without adding a comment everywhere, and will be easier to remove all of those.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Oct 27, 2023

I added a tm.assert_cow_warning helper function to use for all the cases where specifically checking for this warning.

@phofl @mroeschke if possible, I would like to get this merged shortly if possible, as there is a lot more work to do here, but that will be easier done in parallel once this base PR is in.

@phofl
Copy link
Member

phofl commented Oct 27, 2023

I’ll check later today or tomorrow

@phofl phofl merged commit 192a4e3 into pandas-dev:main Oct 30, 2023
33 of 34 checks passed
@phofl
Copy link
Member

phofl commented Oct 30, 2023

thx @jorisvandenbossche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants