Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby.idxmax/idxmin consistently raise on unobserved categorical #55268
BUG: groupby.idxmax/idxmin consistently raise on unobserved categorical #55268
Changes from 8 commits
1325855
e2dd88a
5daaaed
c730ddb
2bcdd81
a97cd21
d62297c
34f4116
2598821
b2a36eb
a1954d6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not possible to stick with the same engine pattern that is in place for this? At first glance I'm wondering what makes these different from
min
/`max`` that requires branching hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is the same issue as i asked about somewhere else: _idxmax_idxmin accepts ignore_unobserved while idxmin/idxmax do not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main difference in behavior is that min/max do not raise on unobserved categories, while idxmin and idxmax do.
Example - agg
However, the fact that we don't do something special for min/max means that transform unnecessarily coerces to float:
Example - transform
I consider this a bug in min/max with transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #55326 to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK thanks - that is helpful context. So do you see the solution for min/max being the same as what you have for idxmin/idxmax here?
I think the broader issue is that we've wanted over time to move away from branching for function specializations within groupby. If that still holds true then I wonder what prevents us from sticking with the existing kwargs interface to solve both this PR and eventually solve min/max's issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to explore different solutions for min/max and other aggregations, but I don't know what that could be at this time.
I don't understand what you're suggesting with kwargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I thought the engine / engine_kwargs were specialized arguments for each function implementation, but I see now those are meant for numba.
The numba functions are UDFs right? I'm assuming from the branch here that we would never want to pass numba arguments to
_idxmax_idxmin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - we never get here when using numba.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely don't need *args here right? Seems like something could get discarded compared to the other branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack - thanks! Fixed and test added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick i think missing space between "how" and colon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also usually but not always we have single-quotes inside these (no idea why and i genuinely dont care, extra since this is private)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - fixed in #54234
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be simpler if we just said changed "arg" to "idx" in the cython method with a comment as to why we are using an apparently-wrong message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This arises from a call to NumPy's argmin in nanops.nanargmin:
In #54234, this remains only for the
axis=1
case, and so once that deprecation is enforced, this code will be removed entirely.