API: groupby aggregation with apply does not drop groupby-column #22542

h-vetinari · 2018-08-30T06:25:41Z

The docs for groupby say (http://pandas.pydata.org/pandas-docs/stable/groupby.html):

Note:
Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True, the default. The grouped columns will be the indices of the returned object.
Passing as_index=False will return the groups that you are aggregating over, if they are named columns.

From the section, it's implied that this is talking about builtins and the aggregate functionality, but I very often find myself operating with complicated functions on the groups themselves, so apply is my bread and butter (and this is part of a larger issue that groupby.apply has some inconsistent behavior).

N = 10
df = pd.DataFrame(index=range(N), columns=['id', 'x', 'y', 'z'])
df.loc[:, ['x', 'y', 'z']] = np.arange(N*3).reshape(N, 3)
df.id = np.random.randint(0, int(N/3), (N,)) + 10
df
#    id   x   y   z
# 0  12   0   1   2
# 1  12   3   4   5
# 2  11   6   7   8
# 3  10   9  10  11
# 4  12  12  13  14
# 5  12  15  16  17
# 6  12  18  19  20
# 7  11  21  22  23
# 8  10  24  25  26
# 9  10  27  28  29

For something like sum, the groupby-column gets dropped, as described:

df.groupby('id').sum()
#      x   y   z
# id            
# 10  60  63  66
# 11  27  29  31
# 12  48  53  58

But for using the same function in apply, the result is different - mainly that the groupby column does not get removed (but also the dtype)

df.groupby('id', as_index=True).apply(lambda gr: gr.sum())
#       id     x     y     z
# id                        
# 10  30.0  60.0  63.0  66.0
# 11  22.0  27.0  29.0  31.0
# 12  60.0  48.0  53.0  58.0

Ideally, I'd like the make the behaviour of groupby.apply more consistent in a number of cases, and this is one of them.

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-08-30T14:43:30Z

Related to #20420 - we generally have a few inconsistencies in apply that need to be cleaned up

h-vetinari · 2018-08-30T19:28:16Z

@WillAyd

Related to #20420 - we generally have a few inconsistencies in apply that need to be cleaned up

Started collecting some of them in #22545

jreback · 2018-08-30T22:06:33Z

rather opening new issues pls look at open existing ones

h-vetinari · 2018-08-30T22:32:34Z

@jreback

rather opening new issues pls look at open existing ones

I did (https://github.com/pandas-dev/pandas/issues?page=2&q=is%3Aissue+is%3Aopen+apply+label%3AGroupby&utf8=%E2%9C%93), but did not find much - guess I did not go back far enough in time - sorry.

Going over them a second time, I did overlook #13217, #15290 and possibly #18103 is somewhat related. I don't think there's something as comprehensive as what I'm trying to summarize in #22545, but #13056 is a start.

simonjayhawkins · 2020-04-24T15:58:22Z

closing as duplicate of #13217. ping me to reopen if I'm missing something.

h-vetinari · 2020-04-24T16:16:25Z

Fine with me.

h-vetinari changed the title ~~API: groupby custom aggregation behaves differently than builtins~~ API: groupby aggregation with apply does not drop groupby-column Aug 30, 2018

WillAyd added the Groupby label Aug 30, 2018

h-vetinari mentioned this issue Aug 30, 2018

API/DOC: clean up DataFrame.groupby.apply #22545

Open

h-vetinari mentioned this issue Nov 18, 2018

Towards "pandas 1.0" #10000

Closed

h-vetinari mentioned this issue Jan 28, 2019

RLS: 0.25.0 #24950

Closed

WillAyd mentioned this issue Sep 20, 2019

BUG: Groupby selection context not being properly reset #28541

Closed

5 tasks

simonjayhawkins closed this as completed Apr 24, 2020

simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Groupby labels Apr 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: groupby aggregation with apply does not drop groupby-column #22542

API: groupby aggregation with apply does not drop groupby-column #22542

h-vetinari commented Aug 30, 2018 •

edited

Loading

WillAyd commented Aug 30, 2018

h-vetinari commented Aug 30, 2018

jreback commented Aug 30, 2018

h-vetinari commented Aug 30, 2018

simonjayhawkins commented Apr 24, 2020

h-vetinari commented Apr 24, 2020

API: groupby aggregation with apply does not drop groupby-column #22542

API: groupby aggregation with apply does not drop groupby-column #22542

Comments

h-vetinari commented Aug 30, 2018 • edited Loading

WillAyd commented Aug 30, 2018

h-vetinari commented Aug 30, 2018

jreback commented Aug 30, 2018

h-vetinari commented Aug 30, 2018

simonjayhawkins commented Apr 24, 2020

h-vetinari commented Apr 24, 2020

h-vetinari commented Aug 30, 2018 •

edited

Loading