Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: groupby.idxmin/idxmax wrong result on extreme values #57046

Merged
merged 8 commits into from
Jan 26, 2024

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Jan 24, 2024

While working on this, I noticed that the skipna argument was also being ignored.

The only way I see solving this is adding a seen Boolean array instead of the sentinel values. cc @WillAyd

@rhshadrach rhshadrach added Bug Groupby Regression Functionality that used to work in a prior pandas version Reduction Operations sum, mean, min, max, etc. labels Jan 24, 2024
@rhshadrach rhshadrach added this to the 2.2.1 milestone Jan 24, 2024
@rhshadrach rhshadrach requested a review from WillAyd as a code owner January 24, 2024 04:29


@pytest.mark.parametrize("how", ["idxmin", "idxmax"])
@pytest.mark.parametrize("dtype", ["float32", "float64"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float_numpy_dtype fixture please

Comment on lines 317 to 318
print(df)
print(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(df)
print(result)

@@ -424,6 +424,7 @@ def _call_cython_op(
mask=mask,
result_mask=result_mask,
is_datetimelike=is_datetimelike,
**kwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is kwargs needed here?

Copy link
Member Author

@rhshadrach rhshadrach Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To pass skipna to group_idxmin_idxmax

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK and I guess not all these accept skipna (IIRC there was that related issue related skipna, but forgot if it was for groupby ops or not)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - I'm planning on turning #56939 into a tracking issue on adding skipna consistently throughout groupby. Can include removing these kwargs as part of that.

@rhshadrach rhshadrach requested a review from mroeschke January 26, 2024 03:50
@mroeschke mroeschke merged commit b5a963c into pandas-dev:main Jan 26, 2024
50 checks passed
@mroeschke
Copy link
Member

Thanks @rhshadrach

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 26, 2024
mroeschke pushed a commit that referenced this pull request Jan 26, 2024
… result on extreme values) (#57086)

Backport PR #57046: REGR: groupby.idxmin/idxmax wrong result on extreme values

Co-authored-by: Richard Shadrach <[email protected]>
@rhshadrach rhshadrach deleted the reg_groupby_idxmin branch January 26, 2024 19:28
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…v#57046)

* WIP

* REGR: groupby.idxmin/idxmax wrong result on extreme values

* fixes

* NumPy 2.0 can't come soon enough

* fixup

* fixup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Reduction Operations sum, mean, min, max, etc. Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: unexpected results for idxmax groupby aggregation on uint column
3 participants