Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fix return class in _constructor_from_mgr for simple subclasses #55764

Conversation

ivirshup
Copy link
Contributor

@ivirshup ivirshup commented Oct 30, 2023

This is a draft for fixing #55763. I'm not familiar with this part of the code, but this seems to fix the issue locally.

This is addressing changes brought in by #54922. Instead of checking the class of ._constructor in _constructor_from_mgr, this checks the class being called.

We were relying on most operations on our subclass returning a pandas dataframe. AFAICT, the previous PR changed the behavior so that most operations would return the class of the instance even if _constructor was not set.

Once I see that this doesn't break tests on CI, added tests will be like:

class SimpleSubClass(DataFrame):
    """A subclass of DataFrame that does not define a constructor."""
    pass

class TestSubclassWithoutConstructor():

    def test_copy(self):
        expected = DataFrame({"a": [1, 2, 3]})
        result = SimpleSubClass(expected).copy()

        tm.assert_frame_equal(result, expected)

    def test_groupby(self):
        df = SimpleSubClass(DataFrame({"a": [1, 2, 3]}))

        for _, v in df.groupby("a"):
            assert isinstance(v, DataFrame)

added to test_subclass.py.

@ivirshup ivirshup marked this pull request as ready for review October 31, 2023 15:16
@jbrockmendel
Copy link
Member

cc @jorisvandenbossche im fine with the change here bc it improves perf the base classes. no strong opinion on the actual behavior.

@jorisvandenbossche
Copy link
Member

it improves perf the base classes

Do you mean that the check type(self) is DataFrame is a bit faster than self._constructor is DataFrame? (it's indeed a few nanoseconds faster)

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Subclassing Subclassing pandas objects labels Nov 1, 2023
@jorisvandenbossche jorisvandenbossche added this to the 2.1.3 milestone Nov 1, 2023
@ivirshup
Copy link
Contributor Author

ivirshup commented Nov 8, 2023

I've addressed the comment, added a test for subclasses of Series, and added a release note.

This does throw a depreciation warning:

import pandas as pd

class SimpleDataFrameSubClass(pd.DataFrame):
    """A subclass of DataFrame that does not define a constructor."""

s = SimpleDataFrameSubClass(pd.DataFrame({"a": [1, 2, 3]}))
s.copy()
<ipython-input-2-6e1d2ee6d3ad>:1: DeprecationWarning: Passing a BlockManager to DataFrame is deprecated and will raise in a future version. Use public APIs instead.
  s.copy()

But the previous fix seems to have done this as well.

It would make sense if this didn't warn, since the user code in my example above can't obviously do anything about this warning besides silence it. It may be out of scope for this regression though, since this PR does not introduce this warning.

@jorisvandenbossche
Copy link
Member

Yes, we still have to figure out how to deal with this deprecation warning in subclasses. But, that deprecation warning is only for pandas 2.2, and so won't yet be visible in a pandas 2.1.3.

And I think my other comment might solve this warning for the simple case where the subclass doesn't override the constructor.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the tests pass, this looks good to me!

@jorisvandenbossche jorisvandenbossche changed the title Revert logic for _constructor selection in _construct_from_mgr REGR: fix return class in _constructor_from_mgr for simple subclasses Nov 9, 2023
@jorisvandenbossche jorisvandenbossche merged commit 95561fc into pandas-dev:main Nov 9, 2023
33 of 34 checks passed
@jorisvandenbossche
Copy link
Member

Thanks @ivirshup !

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Nov 9, 2023
jorisvandenbossche pushed a commit that referenced this pull request Nov 9, 2023
…uctor_from_mgr for simple subclasses) (#55892)

Backport PR #55764: REGR: fix return class in _constructor_from_mgr for simple subclasses

Co-authored-by: Isaac Virshup <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Subclassing Subclassing pandas objects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pandas 2.1.2 changes how copy works
3 participants