Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby, as_index=False still returning group variable as index #13217

Closed
nickeubank opened this issue May 18, 2016 · 13 comments · Fixed by #41431
Closed

BUG: groupby, as_index=False still returning group variable as index #13217

nickeubank opened this issue May 18, 2016 · 13 comments · Fixed by #41431
Labels
Bug good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@nickeubank
Copy link
Contributor

Code Sample, a copy-pastable example if possible

a = pd.DataFrame({'a':[1,1,2,2], 'b':[1,1,2,2], 'c':[1,1,1,1]})
a.groupby(['a','b'], as_index=False).apply(lambda x: 1)

Out[4]: 
a  b
1  1    1
2  2    1
dtype: int64

Expected Output

Out[4]:
0    1
1    1
dtype: int64

(this is what you get with a unique by column --

a.groupby(['a'], as_index=False).apply(lambda x: 1)
Out[8]: 
0    1
1    1
dtype: int64

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 18, 2016

I guess. this is very odd to do.

@jreback
Copy link
Contributor

jreback commented May 18, 2016

to be honest we should just remove as_index entirely. Its a simple .reset_index() if someone wants it.

@jreback jreback added this to the Next Major Release milestone May 18, 2016
@pfrcks
Copy link
Contributor

pfrcks commented May 18, 2016

@jreback I want to look into this.
Can you specify what do you mean by simple reset_index().
Do we apply reset_index attribute in case someone passes the 'as_index' param?

@jreback
Copy link
Contributor

jreback commented May 18, 2016

no that's a different (API issue)

this can be solved by stepping thru code and see where it doesn't properly handle the as_insex flag

@nickeubank
Copy link
Contributor Author

Don't have a strong preference on how it behaves (or if we keep as_index), just want to get rid of inconsistent behavior.

@pfrcks
Copy link
Contributor

pfrcks commented May 18, 2016

@jreback Upon looking through the code, core/groupby.py seems to be responsible in some way.
Upon looking in the file I came across _index_with_as_index function which has not been called anywhere.
The func is supposed to 'Take boolean mask of index to be returned from apply, if as_index=True', but since it is note getting called from anywhere, I don't understand what its purpose.

@jreback
Copy link
Contributor

jreback commented May 18, 2016

@pfrcks write the test and step thru. identify where you think you need to change and test.

@jrbrodie77
Copy link

jrbrodie77 commented Dec 15, 2017

I ran into this same bug today in ver. 0.21

a = pd.DataFrame([np.zeros(3), np.ones(3), 2*np.ones(3)], columns="A B C".split())
a.groupby(['A', 'B'], as_index=False).apply(np.mean)

if my groupby is a pair of column names as_index is ignored.

If I get a chance in the next couple of weeks I may try to find/fix it.

@simonjayhawkins simonjayhawkins changed the title BUG: as_index issues with groupby BUG: as_index=False issues with groupby Apr 24, 2020
@simonjayhawkins simonjayhawkins changed the title BUG: as_index=False issues with groupby BUG: groupby, as_index=False still returning group variable as index Apr 24, 2020
@rhshadrach
Copy link
Member

rhshadrach commented Nov 7, 2020

In this case, the function is an aggregator; with as_index=False the grouping labels should appear in the result as columns. Right now on master:

a = pd.DataFrame({'a':[1, 1, 2, 2], 'b':[1,1,2,2], 'c':[1,1,1,1]})
print(a.groupby(['a','b'], as_index=False).apply(lambda x: 1))
print(a.groupby(['a','b'], as_index=True).apply(lambda x: 1))

gives:

   a  b  NaN
0  1  1    1
1  2  2    1

a  b
1  1    1
2  2    1
dtype: int64

which looks right to me.

@rhshadrach rhshadrach added the Needs Tests Unit test(s) needed to prevent regressions label Nov 7, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.3 May 12, 2021
@vroomzel
Copy link

Still doesn't work in version '1.3.4'

@rhshadrach
Copy link
Member

@vroomzel - can you post the input / output you're seeing, as well as the result of pd.show_versions()

@openSourcerer9000
Copy link

We're still suffering here in v1.5
image

@rhshadrach
Copy link
Member

@openSourcerer9000 - when sharing reproducible examples, please do so in plain text rather than screen shots. Plain text is more convenient for maintainers.

Though perhaps not well documented, I believe as_index is not intended to have an impact when iterating over a groupby object. Can you open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants