Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoW warning mode: warn in case of chained assignment #55522

Merged
merged 15 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,9 @@ repos:
# pytest raises without context
|\s\ pytest.raises

# TODO
# pytest.warns (use tm.assert_produces_warning instead)
|pytest\.warns
# |pytest\.warns

# os.remove
|os\.remove
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ using ``fillna`` if you wish).
.. ipython:: python

df2 = df.copy()
df2["three"]["a"] = 1.0
df2.loc["a", "three"] = 1.0
df
df2
df + df2
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/copy_on_write.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ Chained assignment references a technique where an object is updated through
two subsequent indexing operations, e.g.

.. ipython:: python
:okwarning:

with pd.option_context("mode.copy_on_write", False):
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
Expand Down
3 changes: 1 addition & 2 deletions doc/source/whatsnew/v0.13.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,10 @@ Highlights include:

This would previously segfault:

.. ipython:: python
.. code-block:: python

df = pd.DataFrame({"A": np.array(["foo", "bar", "bah", "foo", "bar"])})
df["A"].iloc[0] = np.nan
df

The recommended way to do this type of assignment is:

Expand Down
28 changes: 22 additions & 6 deletions pandas/_testing/contexts.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
)
import uuid

from pandas._config import using_copy_on_write

from pandas.compat import PYPY
from pandas.errors import ChainedAssignmentError

Expand Down Expand Up @@ -193,9 +195,14 @@ def use_numexpr(use, min_elements=None) -> Generator[None, None, None]:
set_option("compute.use_numexpr", olduse)


def raises_chained_assignment_error(extra_warnings=(), extra_match=()):
def raises_chained_assignment_error(warn=True, extra_warnings=(), extra_match=()):
from pandas._testing import assert_produces_warning

if not warn:
from contextlib import nullcontext

return nullcontext()

if PYPY and not extra_warnings:
from contextlib import nullcontext

Expand All @@ -206,12 +213,21 @@ def raises_chained_assignment_error(extra_warnings=(), extra_match=()):
match="|".join(extra_match),
)
else:
match = (
"A value is trying to be set on a copy of a DataFrame or Series "
"through chained assignment"
)
warning: Warning
if using_copy_on_write():
warning = ChainedAssignmentError
match = (
"A value is trying to be set on a copy of a DataFrame or Series "
"through chained assignment"
)
else:
warning = FutureWarning # type: ignore[assignment]
# TODO update match
match = "ChainedAssignmentError"
if extra_warnings:
warning = (warning, *extra_warnings) # type: ignore[assignment]
return assert_produces_warning(
(ChainedAssignmentError, *extra_warnings),
warning,
match="|".join((match, *extra_match)),
)

Expand Down
10 changes: 10 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4205,6 +4205,16 @@ def __setitem__(self, key, value) -> None:
warnings.warn(
_chained_assignment_msg, ChainedAssignmentError, stacklevel=2
)
# elif not PYPY and not using_copy_on_write():
elif not PYPY and warn_copy_on_write():
if sys.getrefcount(self) <= 3: # and (
# warn_copy_on_write()
# or (
# not warn_copy_on_write()
# and self._mgr.blocks[0].refs.has_reference()
# )
# ):
warnings.warn("ChainedAssignmentError", FutureWarning, stacklevel=2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ẁe don't seem to have tests for where this would warn in the default mode (if you set a column here, like df[slice]["col"] = .. that already nevers update the original df, and you already get a (somewhat incorrect?, as you are not actually "setting on a copy", you are just not setting inplace) SettingWithCopyWarning.

But to fabricate something that we should warn about in the default mode as well:

In [1]: df = DataFrame({"a": [1, 2, 3, 4, 5], "b": 1})

In [2]: df[0:3][0:2] = 10
<ipython-input-2-0c4324ecc90c>:1: SettingWithCopyWarning: 
...

This updates the original df.

So will have to update the above for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started on this locally, but given the many other open PRs that will give conficts, maybe I can leave finishing DataFrame.__setitem__ proper warnings for default mode for a follow-up PR.


key = com.apply_if_callable(key, self)

Expand Down
14 changes: 13 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,10 @@

import numpy as np

from pandas._config import using_copy_on_write
from pandas._config import (
using_copy_on_write,
warn_copy_on_write,
)

from pandas._libs.indexing import NDFrameIndexerBase
from pandas._libs.lib import item_from_zerodim
Expand All @@ -25,6 +28,7 @@
InvalidIndexError,
LossySetitemError,
_chained_assignment_msg,
_check_cacher,
)
from pandas.util._decorators import doc
from pandas.util._exceptions import find_stack_level
Expand Down Expand Up @@ -881,6 +885,14 @@ def __setitem__(self, key, value) -> None:
warnings.warn(
_chained_assignment_msg, ChainedAssignmentError, stacklevel=2
)
elif not PYPY and not using_copy_on_write():
ctr = sys.getrefcount(self.obj)
ref_count = 2
if not warn_copy_on_write() and _check_cacher(self.obj):
# see https://github.com/pandas-dev/pandas/pull/56060#discussion_r1399245221
ref_count += 1
if ctr <= ref_count:
warnings.warn("ChainedAssignmentError", FutureWarning, stacklevel=2)

check_dict_or_set_indexers(key)
if isinstance(key, tuple):
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ def apply(
# Alias so we can share code with ArrayManager
apply_with_block = apply

def setitem(self, indexer, value) -> Self:
def setitem(self, indexer, value, warn: bool = True) -> Self:
"""
Set values with indexer.

Expand All @@ -396,7 +396,7 @@ def setitem(self, indexer, value) -> Self:
if isinstance(indexer, np.ndarray) and indexer.ndim > self.ndim:
raise ValueError(f"Cannot set values with ndim > {self.ndim}")

if warn_copy_on_write() and not self._has_no_reference(0):
if warn and warn_copy_on_write() and not self._has_no_reference(0):
warnings.warn(
COW_WARNING_GENERAL_MSG,
FutureWarning,
Expand Down Expand Up @@ -2059,7 +2059,7 @@ def setitem_inplace(self, indexer, value, warn: bool = True) -> None:
if using_cow:
self.blocks = (self._block.copy(),)
self._cache.clear()
elif warn and warn_cow:
elif warn_cow and warn:
warnings.warn(
COW_WARNING_SETITEM_MSG,
FutureWarning,
Expand Down
52 changes: 38 additions & 14 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@

import numpy as np

from pandas._config import using_copy_on_write
from pandas._config import (
using_copy_on_write,
warn_copy_on_write,
)
from pandas._config.config import _get_option

from pandas._libs import (
Expand Down Expand Up @@ -1209,11 +1212,32 @@ def _get_value(self, label, takeable: bool = False):
return self.iloc[loc]

def __setitem__(self, key, value) -> None:
warn = True
if not PYPY and using_copy_on_write():
if sys.getrefcount(self) <= 3:
warnings.warn(
_chained_assignment_msg, ChainedAssignmentError, stacklevel=2
)
elif not PYPY and not using_copy_on_write():
ctr = sys.getrefcount(self)
ref_count = 3
if not warn_copy_on_write() and _check_cacher(self):
# see https://github.com/pandas-dev/pandas/pull/56060#discussion_r1399245221
ref_count += 1
if ctr <= ref_count and (
warn_copy_on_write()
or (
not warn_copy_on_write()
and self._mgr.blocks[0].refs.has_reference() # type: ignore[union-attr]
)
):
warn = False
warnings.warn(
"ChainedAssignmentError: behaviour will change in pandas 3.0 "
"with Copy-on-Write ...",
FutureWarning,
stacklevel=2,
)

check_dict_or_set_indexers(key)
key = com.apply_if_callable(key, self)
Expand All @@ -1224,10 +1248,10 @@ def __setitem__(self, key, value) -> None:

if isinstance(key, slice):
indexer = self.index._convert_slice_indexer(key, kind="getitem")
return self._set_values(indexer, value)
return self._set_values(indexer, value, warn=warn)

try:
self._set_with_engine(key, value)
self._set_with_engine(key, value, warn=warn)
except KeyError:
# We have a scalar (or for MultiIndex or object-dtype, scalar-like)
# key that is not present in self.index.
Expand Down Expand Up @@ -1293,18 +1317,18 @@ def __setitem__(self, key, value) -> None:
return

else:
self._set_with(key, value)
self._set_with(key, value, warn=warn)

if cacher_needs_updating:
self._maybe_update_cacher(inplace=True)

def _set_with_engine(self, key, value) -> None:
def _set_with_engine(self, key, value, warn: bool = True) -> None:
loc = self.index.get_loc(key)

# this is equivalent to self._values[key] = value
self._mgr.setitem_inplace(loc, value)
self._mgr.setitem_inplace(loc, value, warn=warn)

def _set_with(self, key, value) -> None:
def _set_with(self, key, value, warn: bool = True) -> None:
# We got here via exception-handling off of InvalidIndexError, so
# key should always be listlike at this point.
assert not isinstance(key, tuple)
Expand All @@ -1315,7 +1339,7 @@ def _set_with(self, key, value) -> None:

if not self.index._should_fallback_to_positional:
# Regardless of the key type, we're treating it as labels
self._set_labels(key, value)
self._set_labels(key, value, warn=warn)

else:
# Note: key_type == "boolean" should not occur because that
Expand All @@ -1332,23 +1356,23 @@ def _set_with(self, key, value) -> None:
FutureWarning,
stacklevel=find_stack_level(),
)
self._set_values(key, value)
self._set_values(key, value, warn=warn)
else:
self._set_labels(key, value)
self._set_labels(key, value, warn=warn)

def _set_labels(self, key, value) -> None:
def _set_labels(self, key, value, warn: bool = True) -> None:
key = com.asarray_tuplesafe(key)
indexer: np.ndarray = self.index.get_indexer(key)
mask = indexer == -1
if mask.any():
raise KeyError(f"{key[mask]} not in index")
self._set_values(indexer, value)
self._set_values(indexer, value, warn=warn)

def _set_values(self, key, value) -> None:
def _set_values(self, key, value, warn: bool = True) -> None:
if isinstance(key, (Index, Series)):
key = key._values

self._mgr = self._mgr.setitem(indexer=key, value=value)
self._mgr = self._mgr.setitem(indexer=key, value=value, warn=warn)
self._maybe_update_cacher()

def _set_value(self, label, value, takeable: bool = False) -> None:
Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/copy_view/test_chained_assignment_deprecation.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import numpy as np
import pytest

from pandas.errors import ChainedAssignmentError

from pandas import DataFrame
import pandas._testing as tm

Expand Down Expand Up @@ -61,3 +64,21 @@ def test_methods_iloc_getitem_item_cache(func, args, using_copy_on_write):
else:
with tm.assert_cow_warning(match="A value"):
df["a"].fillna(0, inplace=True)


# TODO(CoW-warn) expand the cases
@pytest.mark.parametrize(
"indexer", [0, [0, 1], slice(0, 2), np.array([True, False, True])]
)
def test_series_setitem(indexer, using_copy_on_write):
# ensure we only get a single warning for those typical cases of chained
# assignment
df = DataFrame({"a": [1, 2, 3], "b": 1})
with pytest.warns() as record:
df["a"][indexer] = 0
assert len(record) == 1
if using_copy_on_write:
assert record[0].category == ChainedAssignmentError
else:
assert record[0].category == FutureWarning
assert "ChainedAssignmentError" in record[0].message.args[0]
10 changes: 3 additions & 7 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -1302,17 +1302,13 @@ def test_setitem_column_update_inplace(
df = DataFrame({col: np.zeros(len(labels)) for col in labels}, index=labels)
values = df._mgr.blocks[0].values

with tm.raises_chained_assignment_error():
for label in df.columns:
df[label][label] = 1
if not using_copy_on_write:
with tm.assert_cow_warning(warn_copy_on_write):
for label in df.columns:
df[label][label] = 1

# diagonal values all updated
assert np.all(values[np.arange(10), np.arange(10)] == 1)
else:
with tm.raises_chained_assignment_error():
for label in df.columns:
df[label][label] = 1
# original dataframe not updated
assert np.all(values[np.arange(10), np.arange(10)] == 0)

Expand Down
3 changes: 1 addition & 2 deletions pandas/tests/frame/indexing/test_xs.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,7 @@ def test_xs_view(
dm.xs(2)[:] = 20
assert not (dm.xs(2) == 20).any()
else:
# TODO(CoW-warn) should this raise a specific warning about being chained?
with tm.assert_cow_warning(warn_copy_on_write):
with tm.raises_chained_assignment_error():
dm.xs(2)[:] = 20
assert (dm.xs(2) == 20).all()

Expand Down
8 changes: 1 addition & 7 deletions pandas/tests/frame/test_block_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,13 +348,7 @@ def test_stale_cached_series_bug_473(self, using_copy_on_write, warn_copy_on_wri
)
repr(Y)
Y["e"] = Y["e"].astype("object")
if using_copy_on_write:
with tm.raises_chained_assignment_error():
Y["g"]["c"] = np.nan
elif warn_copy_on_write:
with tm.assert_cow_warning():
Y["g"]["c"] = np.nan
else:
with tm.raises_chained_assignment_error():
Y["g"]["c"] = np.nan
repr(Y)
Y.sum()
Expand Down
7 changes: 4 additions & 3 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ def test_constructor_dtype_nocast_view_dataframe(
assert df.values[0, 0] == 1
else:
with tm.assert_cow_warning(warn_copy_on_write):
should_be_view[0][0] = 99
should_be_view.iloc[0, 0] = 99
assert df.values[0, 0] == 99

def test_constructor_dtype_nocast_view_2d_array(
Expand All @@ -312,8 +312,9 @@ def test_constructor_dtype_nocast_view_2d_array(
df = DataFrame([[1, 2], [3, 4]], dtype="int64")
if not using_array_manager and not using_copy_on_write:
should_be_view = DataFrame(df.values, dtype=df[0].dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this warn? Not a blocking comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking because currently it is a view and changes the parent dataframe?
But the input here is a numpy array, so not sure if we know that / if we have a way to warn users about the change in behaviour (essentially copy=False becomes copy=True if the input is a 2D ndarray)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is no way of knowing where the NumPy array came from, we have no clue that It was the backing array of a DataFrame before that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that's something we will simply have to document in the list of breaking changes.

with tm.assert_cow_warning(warn_copy_on_write):
should_be_view[0][0] = 97
# TODO(CoW-warn) this should warn
# with tm.assert_cow_warning(warn_copy_on_write):
should_be_view.iloc[0, 0] = 97
assert df.values[0, 0] == 97
else:
# INFO(ArrayManager) DataFrame(ndarray) doesn't necessarily preserve
Expand Down
Loading
Loading