REGR: groupby.idxmin/idxmax wrong result on extreme values #57046

rhshadrach · 2024-01-24T04:29:20Z

closes BUG: unexpected results for idxmax groupby aggregation on uint column #57040 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

While working on this, I noticed that the skipna argument was also being ignored.

The only way I see solving this is adding a seen Boolean array instead of the sentinel values. cc @WillAyd

mroeschke · 2024-01-24T21:07:20Z

pandas/tests/groupby/test_reductions.py

+
+
+@pytest.mark.parametrize("how", ["idxmin", "idxmax"])
+@pytest.mark.parametrize("dtype", ["float32", "float64"])


float_numpy_dtype fixture please

mroeschke · 2024-01-24T21:07:29Z

pandas/tests/groupby/test_reductions.py

+    print(df)
+    print(result)


Suggested change

print(df)

print(result)

mroeschke · 2024-01-24T21:08:30Z

pandas/core/groupby/ops.py

@@ -424,6 +424,7 @@ def _call_cython_op(
                    mask=mask,
                    result_mask=result_mask,
                    is_datetimelike=is_datetimelike,
+                    **kwargs,


Why is kwargs needed here?

To pass skipna to group_idxmin_idxmax

Ah OK and I guess not all these accept skipna (IIRC there was that related issue related skipna, but forgot if it was for groupby ops or not)

Yea - I'm planning on turning #56939 into a tracking issue on adding skipna consistently throughout groupby. Can include removing these kwargs as part of that.

…ndas into reg_groupby_idxmin

# Conflicts: # doc/source/whatsnew/v2.2.1.rst

mroeschke · 2024-01-26T17:19:24Z

Thanks @rhshadrach

…t on extreme values

… result on extreme values) (#57086) Backport PR #57046: REGR: groupby.idxmin/idxmax wrong result on extreme values Co-authored-by: Richard Shadrach <[email protected]>

…v#57046) * WIP * REGR: groupby.idxmin/idxmax wrong result on extreme values * fixes * NumPy 2.0 can't come soon enough * fixup * fixup

rhshadrach added 2 commits January 23, 2024 17:36

WIP

7063fce

REGR: groupby.idxmin/idxmax wrong result on extreme values

866b581

rhshadrach added Bug Groupby Regression Functionality that used to work in a prior pandas version Reduction Operations sum, mean, min, max, etc. labels Jan 24, 2024

rhshadrach added this to the 2.2.1 milestone Jan 24, 2024

rhshadrach requested a review from WillAyd as a code owner January 24, 2024 04:29

mroeschke reviewed Jan 24, 2024

View reviewed changes

rhshadrach added 4 commits January 24, 2024 16:28

Merge branch 'reg_groupby_idxmin' of https://github.com/rhshadrach/pa…

351bf69

…ndas into reg_groupby_idxmin

fixes

2dfe3a4

NumPy 2.0 can't come soon enough

9c54b8e

Merge remote-tracking branch 'upstream/main' into reg_groupby_idxmin

b424a5b

# Conflicts: # doc/source/whatsnew/v2.2.1.rst

WillAyd approved these changes Jan 25, 2024

View reviewed changes

rhshadrach added 2 commits January 25, 2024 16:18

fixup

d2dda0a

fixup

cf22b81

rhshadrach requested a review from mroeschke January 26, 2024 03:50

mroeschke approved these changes Jan 26, 2024

View reviewed changes

mroeschke merged commit b5a963c into pandas-dev:main Jan 26, 2024
50 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 26, 2024

Backport PR pandas-dev#57046: REGR: groupby.idxmin/idxmax wrong resul…

0d7afea

…t on extreme values

meeseeksmachine mentioned this pull request Jan 26, 2024

Backport PR #57046 on branch 2.2.x (REGR: groupby.idxmin/idxmax wrong result on extreme values) #57086

Merged

rhshadrach deleted the reg_groupby_idxmin branch January 26, 2024 19:28

rhshadrach mentioned this pull request Jan 31, 2024

BUG: pandas.core.groupby.SeriesGroupBy.idxmax (and idxmin) - unexpected behavior with groups containing single NaN value #57176

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: groupby.idxmin/idxmax wrong result on extreme values #57046

REGR: groupby.idxmin/idxmax wrong result on extreme values #57046

rhshadrach commented Jan 24, 2024 •

edited

Loading

mroeschke Jan 24, 2024

mroeschke Jan 24, 2024

mroeschke Jan 24, 2024

rhshadrach Jan 24, 2024 •

edited

Loading

mroeschke Jan 24, 2024

rhshadrach Jan 24, 2024

mroeschke commented Jan 26, 2024



		@pytest.mark.parametrize("how", ["idxmin", "idxmax"])
		@pytest.mark.parametrize("dtype", ["float32", "float64"])

REGR: groupby.idxmin/idxmax wrong result on extreme values #57046

REGR: groupby.idxmin/idxmax wrong result on extreme values #57046

Conversation

rhshadrach commented Jan 24, 2024 • edited Loading

mroeschke Jan 24, 2024

Choose a reason for hiding this comment

mroeschke Jan 24, 2024

Choose a reason for hiding this comment

mroeschke Jan 24, 2024

Choose a reason for hiding this comment

rhshadrach Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

mroeschke Jan 24, 2024

Choose a reason for hiding this comment

rhshadrach Jan 24, 2024

Choose a reason for hiding this comment

mroeschke commented Jan 26, 2024

rhshadrach commented Jan 24, 2024 •

edited

Loading

rhshadrach Jan 24, 2024 •

edited

Loading