DEPR: pandas.core for groupby #55429

rhshadrach · 2023-10-06T21:29:50Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Part of #27522.

Just a PoC for right now to see what the CI says. Various parts of this are stolen from numpy/numpy#24634.

Not sure if we want to do this all at once or submodule by submodule. I'm thinking doing it one submodule at a time is easier for review and we can try just e.g. groupby first and see how that goes.

All commits prior to "Automated changes" only need to be done the first time. The "Automated changes" commit is the result of the script below. A few manual changes need to be made after.

WARNING: The script below will completely clean out your git repo by doing a git reset --hard and git clean -xfd

Automated changes

import os
import sys
import subprocess
from pathlib import Path

def run_in_shell(cmd):
    print(cmd)
    response = subprocess.run(cmd, shell=True, capture_output=True)
    stdout_str = response.stdout.decode("utf-8").strip()
    if stdout_str != '':
        print(stdout_str)
    stderr_str = response.stderr.decode("utf-8").strip()
    if stderr_str != '':
        raise ValueError(stderr_str)

base_path = Path("/home/richard/dev/pandas/pandas")
submodule = "groupby"

run_in_shell(f"cd {base_path} && git reset --hard HEAD")
run_in_shell(f"cd {base_path} && git clean -xfd")
os.makedirs(base_path / "_core" / submodule, exist_ok=True)
run_in_shell(f"cd {base_path} && git mv core/{submodule}/* _core/{submodule}/")

old = f'core\.{submodule}'
new = f'_core\.{submodule}'
run_in_shell(f"cd {base_path} && grep -rl '{old}' . --exclude-dir=core/{submodule} | xargs sed -i 's/{old}/{new}/g'")
run_in_shell(f"cd {base_path}/../ci && grep -rl '{old}' . | xargs sed -i 's/{old}/{new}/g'")
run_in_shell(f"cd {base_path}/../doc && grep -rl '{old}' . | xargs sed -i 's/{old}/{new}/g'")
run_in_shell(f"cd {base_path}/../ && sed -i 's/{old}/{new}/g' pyproject.toml")

old = f'from pandas\.core import {submodule}'
new = f'from pandas\._core import {submodule}'
run_in_shell(f"cd {base_path} && grep -rl '{old}' . --exclude-dir=core/{submodule} | xargs sed -i 's/{old}/{new}/g'")

old = f'pandas\/core\/{submodule}'
new = f'pandas\/_core\/{submodule}'
run_in_shell(f"cd {base_path}/../ && sed -i 's/{old}/{new}/g' pyright_reportGeneralTypeIssues.json")

for filename in (base_path / "_core" / submodule).glob("**/*"):
    if not str(filename).endswith(".py"):
        continue
    name = filename.parts[-1][:-len(".py")]
    output_path = str(filename).replace("/_core/", "/core/")

    if name == "__init__":
        contents = f"""from __future__ import annotations
        
from typing import Any

from pandas._core import {submodule} as {submodule}_
from pandas.core.common import _depr_core


def __getattr__(attr_name: str) -> Any:
    attr = getattr({submodule}_, attr_name)
    _depr_core()
    return attr
"""

    else:
        contents = f"""from __future__ import annotations
from pandas._core.{submodule} import {name}
from pandas.core.common import _depr_core

_depr_core()

_globals = globals()

for item in {name}.__dir__():
    _globals[item] = getattr({name}, item)
"""

    with open(output_path, mode="w") as f:
        f.write(contents)

run_in_shell(f"cd {base_path} && git add core/{submodule}/*")

jbrockmendel · 2023-10-07T16:49:44Z

Haven't looked closely, but big picture this seems like something we ought to do eventually and might as well bite the bullet.

rhshadrach · 2023-10-07T17:25:16Z

Haven't looked closely, but big picture this seems like something we ought to do eventually and might as well bite the bullet.

I think my latest commits may have messed up the diff, and that we can get it down to < 1000 lines when done properly. I'm hoping I can automate a bit more and will be force-pushing quite a bit here. Will ping when ready.

rhshadrach · 2023-10-15T16:08:02Z

Looks like my previous comment was wrong - I think because we need to add new content to the old location, GitHub's diff isn't registering this as a move. Bummer.

jbrockmendel · 2023-10-16T19:18:45Z

kind of orthogonal to this but i want to write it down somewhere: the naming in core.groupby is weird to me bc generic.py is for the non-base classes whereas core.generic is for the base class

mroeschke · 2023-10-17T17:33:48Z

Just noting that this is how NumPy is deprecating their core usage: https://github.com/numpy/numpy/blob/main/numpy/core/_utils.py

rhshadrach · 2023-10-17T19:59:58Z

@mroeschke: happy to make any changes if desired. Because this diff is so large, here is the one I have (in pandas.core.common._depr_core):

warnings.warn(
        "pandas.core is deprecated and has been renamed to "
        "pandas._core. Accessing `_core` directly is discouraged as "
        "members can change without warning.  You should use a public module "
        "instead that exports the attribute in question. If you still would "
        "like to access an attribute from it, please use pandas._core.",
        DeprecationWarning,
        stacklevel=3,
    )

rhshadrach · 2023-10-17T20:00:48Z

One other open question in my mind: do we do all of core at once, or go (roughly) submodule-by-submodule over time (but all within 2.2)?

rhshadrach · 2023-10-18T00:23:25Z

@jbrockmendel @mroeschke I think this is ready for review. Most of the diff comes from the Automated changes commit, produced by the script in the OP.

If we don't do import pandas.core.groupby in pandas/__init__.py, then we wouldn't be able to do e.g. from pandas.core.groupby.groupby import GroupBy. I'm still trying to figure out a better way for this, but maybe it's okay?

rhshadrach · 2023-11-02T00:28:14Z

Friendly ping @jbrockmendel @mroeschke @phofl

mroeschke · 2023-11-03T23:47:24Z

pandas/core/groupby/__init__.py

+from typing import Any
+
+from pandas._core import groupby as groupby_
+from pandas.core.common import _depr_core


In the future, will this raise it's own DeprecationWarning if this is imported from pandas.core?

Once we deprecate pandas.core.common, I think we'd move this method to pandas._core.common. So yes - but we won't be importing it from pandas.core.common ourselves.

mroeschke · 2023-11-03T23:49:40Z

pandas/tests/groupby/test_api.py

 - `reduction_kernels`
 - `transformation_kernels`
 - `groupby_other_methods`
-see the comments in pandas/core/groupby/base.py for guidance on
+see the comments in pandas/_core.groupby/base.py for guidance on


Suggested change

see the comments in pandas/_core.groupby/base.py for guidance on

see the comments in pandas/_core/groupby/base.py for guidance on

Looks like there's one below (and possibly elsewhere)?

Thanks - my script for automated changes was doing

old = f'core.{submodule}' new = f'_core.{submodule}'

whereas it should have been \. in the regex. Will fix.

This is now fixed.

mroeschke · 2023-11-03T23:53:34Z

pandas/core/groupby/__init__.py

+from pandas.core.common import _depr_core
+
+
+def __getattr__(attr_name: str) -> Any:


If dir(pandas.core.groupby) is called, would we get the same result before/after this change?

No - I'm getting:

['Any', '__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_depr_core', 'annotations', 'groupby_']

I don't immediately see a manageable way to have the dir results not change, will think on this.

Looking at the NumPy deprecation (link in OP), I think they have the same issue.

rhshadrach · 2023-11-28T22:05:00Z

@mroeschke - friendly ping. Docs build only failed because of the SciPy URL issue.

At this point, I'd think about holding off on this until 2.2 is out. I would like to start on this early in that release cycle though, so good to get any further issues ironed out.

My preference would be to do just one submodule and wait a week or two with it in main before moving on, then do the rest of core all in one go. But I'd be okay with doing the entire thing piece-meal in a series of PRs if others prefer.

jbrockmendel · 2023-12-01T17:02:39Z

My preference would be to do just one submodule and wait a week or two with it in main before moving on, then do the rest of core all in one go. But I'd be okay with doing the entire thing piece-meal in a series of PRs if others prefer.

No opinion here, strong default to defer to the person doing the actual work: you.

IIRC dask has a place where they do something like

import pandas as pd

@inherit_docstring_or_something(pd.core.groupby.DataFrameGroupBy)
class DataFrameGroupBy:

that might be affected by this (I think i ran into this when trying to make groupby import lazy)

jbrockmendel · 2023-12-01T17:03:43Z

doc/source/whatsnew/v0.25.1.rst

@@ -85,7 +85,7 @@ Plotting
 GroupBy/resample/rolling
 ^^^^^^^^^^^^^^^^^^^^^^^^

- Fixed regression in :meth:`pands.core.groupby.DataFrameGroupBy.quantile` raising when multiple quantiles are given (:issue:`27526`)
+- Fixed regression in :meth:`pands._core.groupby.DataFrameGroupBy.quantile` raising when multiple quantiles are given (:issue:`27526`)


could just trim the modules? also typo pands->pandas

Thanks - I handled most of these in #55555 and #55626, but I think this one escaped my grepping because of the typo. Will do another precursor.

jbrockmendel · 2023-12-01T17:04:46Z

pandas/_core/groupby/indexing.py

@@ -0,0 +1,304 @@
+from __future__ import annotations


sometimes git understands when a file is just renamed. not sure how that happens. can we make it happen here?

I think no if we want to do this all in one shot. I believe the issue is that we are renaming but then also recreating the files with different content, and git doesn't recognize this. If we were to split this up into two PRs: (1) rename core -> _core and then (2) add back the core files for backwards compatibility / deprecation, then I think git would recognize it.

Breaking it up into these two steps would certainly make review easier, but we'd need to make sure we carry it all out before the release.

phofl · 2023-12-01T17:05:03Z

IIRC dask has a place where they do something like

Yes that's correct, but that doesn't affect only dask, it affects our Sphinx docs as well (they will show private paths). Richard and I chatted about this a while ago and IIRC having the classes accessible in the typing api module would solve that (we can move dasks imports over)

@rhshadrach is this a correct recollection of our chat?

jbrockmendel · 2023-12-01T17:06:41Z

id want to get @jorisvandenbossche's OK on this before moving forward

rhshadrach · 2023-12-14T02:04:14Z

During the dev call today, @jorisvandenbossche suggested just doing the move without the deprecation first. It also seems to me it'd be better to do this as two PRs: one moving core to _core followed by a second that adds the stub files. I will be putting PRs for this approach up in place of this one.

rhshadrach added API Design Deprecate Functionality to remove in pandas Groupby labels Oct 6, 2023

rhshadrach force-pushed the depr_core_groupby branch from 2f6ebcd to ad2846c Compare October 15, 2023 16:05

rhshadrach mentioned this pull request Oct 16, 2023

REF: groupby internal names #55551

Open

rhshadrach force-pushed the depr_core_groupby branch 2 times, most recently from aaa1061 to deeb7e0 Compare October 17, 2023 22:12

rhshadrach marked this pull request as ready for review October 18, 2023 00:23

rhshadrach requested review from Dr-Irv and mroeschke as code owners October 18, 2023 00:23

rhshadrach requested a review from phofl October 21, 2023 16:54

rhshadrach force-pushed the depr_core_groupby branch from deeb7e0 to b09c53b Compare November 1, 2023 22:17

mroeschke reviewed Nov 3, 2023

View reviewed changes

rhshadrach added 3 commits November 15, 2023 20:27

DEPR: pandas.core for groupby

c2d9dab

Automated changes

61becb7

Manual changes

2136bb2

rhshadrach force-pushed the depr_core_groupby branch from b09c53b to 2136bb2 Compare November 16, 2023 01:35

jbrockmendel reviewed Dec 1, 2023

View reviewed changes

rhshadrach closed this Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: pandas.core for groupby #55429

DEPR: pandas.core for groupby #55429

rhshadrach commented Oct 6, 2023 •

edited

Loading

jbrockmendel commented Oct 7, 2023

rhshadrach commented Oct 7, 2023

rhshadrach commented Oct 15, 2023

jbrockmendel commented Oct 16, 2023

mroeschke commented Oct 17, 2023

rhshadrach commented Oct 17, 2023

rhshadrach commented Oct 17, 2023

rhshadrach commented Oct 18, 2023 •

edited

Loading

rhshadrach commented Nov 2, 2023

mroeschke Nov 3, 2023

rhshadrach Nov 14, 2023 •

edited

Loading

mroeschke Nov 3, 2023

rhshadrach Nov 14, 2023

rhshadrach Nov 16, 2023

mroeschke Nov 3, 2023

rhshadrach Nov 14, 2023 •

edited

Loading

rhshadrach Nov 14, 2023

rhshadrach commented Nov 28, 2023 •

edited

Loading

jbrockmendel commented Dec 1, 2023

jbrockmendel Dec 1, 2023

rhshadrach Dec 1, 2023

jbrockmendel Dec 1, 2023

rhshadrach Dec 1, 2023

phofl commented Dec 1, 2023

jbrockmendel commented Dec 1, 2023

rhshadrach commented Dec 14, 2023 •

edited

Loading

	see the comments in pandas/_core.groupby/base.py for guidance on
	see the comments in pandas/_core/groupby/base.py for guidance on

		from pandas.core.common import _depr_core


		def __getattr__(attr_name: str) -> Any:

DEPR: pandas.core for groupby #55429

DEPR: pandas.core for groupby #55429

Conversation

rhshadrach commented Oct 6, 2023 • edited Loading

jbrockmendel commented Oct 7, 2023

rhshadrach commented Oct 7, 2023

rhshadrach commented Oct 15, 2023

jbrockmendel commented Oct 16, 2023

mroeschke commented Oct 17, 2023

rhshadrach commented Oct 17, 2023

rhshadrach commented Oct 17, 2023

rhshadrach commented Oct 18, 2023 • edited Loading

rhshadrach commented Nov 2, 2023

Choose a reason for hiding this comment

rhshadrach Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach commented Nov 28, 2023 • edited Loading

jbrockmendel commented Dec 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phofl commented Dec 1, 2023

jbrockmendel commented Dec 1, 2023

rhshadrach commented Dec 14, 2023 • edited Loading

rhshadrach commented Oct 6, 2023 •

edited

Loading

rhshadrach commented Oct 18, 2023 •

edited

Loading

rhshadrach Nov 14, 2023 •

edited

Loading

rhshadrach Nov 14, 2023 •

edited

Loading

rhshadrach commented Nov 28, 2023 •

edited

Loading

rhshadrach commented Dec 14, 2023 •

edited

Loading