-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Add limit_area to EA ffill/bfill #56616
Conversation
doc/source/whatsnew/v2.2.0.rst
Outdated
@@ -313,7 +313,7 @@ Other enhancements | |||
- :meth:`DataFrame.apply` now allows the usage of numba (via ``engine="numba"``) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`) | |||
- :meth:`ExtensionArray._explode` interface method added to allow extension type implementations of the ``explode`` method (:issue:`54833`) | |||
- :meth:`ExtensionArray.duplicated` added to allow extension type implementations of the ``duplicated`` method (:issue:`55255`) | |||
- :meth:`Series.ffill`, :meth:`Series.bfill`, :meth:`DataFrame.ffill`, and :meth:`DataFrame.bfill` have gained the argument ``limit_area`` (:issue:`56492`) | |||
- :meth:`Series.ffill`, :meth:`Series.bfill`, :meth:`DataFrame.ffill`, and :meth:`DataFrame.bfill` have gained the argument ``limit_area``; 3rd part :class:`.ExtensionArray` authors need to add this argument to the method ``_pad_or_backfill`` (:issue:`56492`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3rd party instead of 3rd part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this into the ExtensionArray section as well
@@ -1012,6 +1018,12 @@ def _pad_or_backfill( | |||
DeprecationWarning, | |||
stacklevel=find_stack_level(), | |||
) | |||
if limit_area is not None: | |||
raise NotImplementedError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only get here if 3rd party authors didn't implement this themselves, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - this is only hit when EA author overrides .fillna
but not ._pad_or_backfill
. Currently pandas .ffill
will call the EA's .fillna
in such a case, which can only be correctly done when limit_area
is None.
Friendly ping @phofl @jbrockmendel |
@@ -205,7 +210,21 @@ def _pad_or_backfill( | |||
if copy: | |||
npvalues = npvalues.copy() | |||
new_mask = new_mask.copy() | |||
elif limit_area is not None: | |||
mask = mask.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking, but why do we need this copy here?
thx @rhshadrach |
…fill) (#56720) Backport PR #56616: BUG: Add limit_area to EA ffill/bfill Co-authored-by: Richard Shadrach <[email protected]>
Opened #56729 to track the deprecation |
limit_area
passed. #41813 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Follow up to #56531.
This is slightly complicated by the
EA.fillna
deprecation.EA.fillna
is implemented, we raise wheneverlimit_area
is not NoneEA._pad_or_backfill
is implemented and does not have alimit_area
arg, we raise whenlimit_area
is not NoneMy plan is to introduce a deprecation in 3.1 whenever
_pad_or_backfill
gets called (regardless of whetherlimit_area
isNone
) informing EA authors they need to add this argument. Assuming this is a good way to do, I'll open a tracking issue for this.For dtypes where the corresponding
mask
is not used after filling (e.g.NDArrayBackedExtensionArray
), this takes a shortcut by modifying mask prior to filling. Compared to implementinglimit_area
after filling, this saves 1 or 2 copies (depending on the EA) and extra computation, at the cost of makingmask
no longer necessarily represent NA values.