-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: pandas.core for groupby #55429
Conversation
Haven't looked closely, but big picture this seems like something we ought to do eventually and might as well bite the bullet. |
I think my latest commits may have messed up the diff, and that we can get it down to < 1000 lines when done properly. I'm hoping I can automate a bit more and will be force-pushing quite a bit here. Will ping when ready. |
2f6ebcd
to
ad2846c
Compare
Looks like my previous comment was wrong - I think because we need to add new content to the old location, GitHub's diff isn't registering this as a move. Bummer. |
kind of orthogonal to this but i want to write it down somewhere: the naming in core.groupby is weird to me bc generic.py is for the non-base classes whereas core.generic is for the base class |
Just noting that this is how NumPy is deprecating their core usage: https://github.com/numpy/numpy/blob/main/numpy/core/_utils.py |
@mroeschke: happy to make any changes if desired. Because this diff is so large, here is the one I have (in
|
One other open question in my mind: do we do all of |
aaa1061
to
deeb7e0
Compare
@jbrockmendel @mroeschke I think this is ready for review. Most of the diff comes from the If we don't do |
deeb7e0
to
b09c53b
Compare
Friendly ping @jbrockmendel @mroeschke @phofl |
from typing import Any | ||
|
||
from pandas._core import groupby as groupby_ | ||
from pandas.core.common import _depr_core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, will this raise it's own DeprecationWarning
if this is imported from pandas.core
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we deprecate pandas.core.common
, I think we'd move this method to pandas._core.common
. So yes - but we won't be importing it from pandas.core.common
ourselves.
pandas/tests/groupby/test_api.py
Outdated
- `reduction_kernels` | ||
- `transformation_kernels` | ||
- `groupby_other_methods` | ||
see the comments in pandas/core/groupby/base.py for guidance on | ||
see the comments in pandas/_core.groupby/base.py for guidance on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the comments in pandas/_core.groupby/base.py for guidance on | |
see the comments in pandas/_core/groupby/base.py for guidance on |
Looks like there's one below (and possibly elsewhere)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - my script for automated changes was doing
old = f'core.{submodule}'
new = f'_core.{submodule}'
whereas it should have been \.
in the regex. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now fixed.
from pandas.core.common import _depr_core | ||
|
||
|
||
def __getattr__(attr_name: str) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dir(pandas.core.groupby)
is called, would we get the same result before/after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - I'm getting:
['Any', '__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_depr_core', 'annotations', 'groupby_']
I don't immediately see a manageable way to have the dir results not change, will think on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the NumPy deprecation (link in OP), I think they have the same issue.
b09c53b
to
2136bb2
Compare
@mroeschke - friendly ping. Docs build only failed because of the SciPy URL issue. At this point, I'd think about holding off on this until 2.2 is out. I would like to start on this early in that release cycle though, so good to get any further issues ironed out. My preference would be to do just one submodule and wait a week or two with it in main before moving on, then do the rest of core all in one go. But I'd be okay with doing the entire thing piece-meal in a series of PRs if others prefer. |
No opinion here, strong default to defer to the person doing the actual work: you. IIRC dask has a place where they do something like
that might be affected by this (I think i ran into this when trying to make groupby import lazy) |
@@ -85,7 +85,7 @@ Plotting | |||
GroupBy/resample/rolling | |||
^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
- Fixed regression in :meth:`pands.core.groupby.DataFrameGroupBy.quantile` raising when multiple quantiles are given (:issue:`27526`) | |||
- Fixed regression in :meth:`pands._core.groupby.DataFrameGroupBy.quantile` raising when multiple quantiles are given (:issue:`27526`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just trim the modules? also typo pands->pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,304 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes git understands when a file is just renamed. not sure how that happens. can we make it happen here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no if we want to do this all in one shot. I believe the issue is that we are renaming but then also recreating the files with different content, and git doesn't recognize this. If we were to split this up into two PRs: (1) rename core -> _core and then (2) add back the core files for backwards compatibility / deprecation, then I think git would recognize it.
Breaking it up into these two steps would certainly make review easier, but we'd need to make sure we carry it all out before the release.
Yes that's correct, but that doesn't affect only dask, it affects our Sphinx docs as well (they will show private paths). Richard and I chatted about this a while ago and IIRC having the classes accessible in the typing api module would solve that (we can move dasks imports over) @rhshadrach is this a correct recollection of our chat? |
id want to get @jorisvandenbossche's OK on this before moving forward |
During the dev call today, @jorisvandenbossche suggested just doing the move without the deprecation first. It also seems to me it'd be better to do this as two PRs: one moving |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Part of #27522.
Just a PoC for right now to see what the CI says. Various parts of this are stolen from numpy/numpy#24634.
Not sure if we want to do this all at once or submodule by submodule. I'm thinking doing it one submodule at a time is easier for review and we can try just e.g. groupby first and see how that goes.
All commits prior to "Automated changes" only need to be done the first time. The "Automated changes" commit is the result of the script below. A few manual changes need to be made after.
WARNING: The script below will completely clean out your git repo by doing a
git reset --hard
andgit clean -xfd
Automated changes