Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mean with skipna either True or False on groupby gives error #32696

Closed
navankurverma opened this issue Mar 14, 2020 · 4 comments
Closed

mean with skipna either True or False on groupby gives error #32696

navankurverma opened this issue Mar 14, 2020 · 4 comments
Labels
Duplicate Report Duplicate issue or pull request Groupby Reduction Operations sum, mean, min, max, etc.

Comments

@navankurverma
Copy link

Problem description

Using .mean(skipna = True) or .mean(skipna = False) on a groupby object gives error:

UnsupportedFunctionCall: numpy operations are not valid with groupby. Use .groupby(...).mean() instead

skipna is a crucial parameter while doing analysis. By default skipna = True, so if not given explicitly results in expected output but if given explicitly either as True or False, returns error.

Code Sample:

import pandas as pd
df = pd.DataFrame({'elements':[144,214,166,166,145,144,214],
                  'points':[1,2,3,None,1,1,1]})

df.groupby('elements').mean() ###this works
df.groupby('elements').mean(skipna = True) ###this doesn't work
df.groupby('elements').mean(skipna = False) ###this doesn't work

Expected Output

with mean(skipna = True):

elements  points
144       1.0
145       1.0
166       3.0
214       1.5

with mean(skipna = False):

elements  points
144       1.0
145       1.0
166       NaN
214       1.5

Workaround I tried which gives expected output:

def custom_mean(df):
    return df.mean(skipna=False)
df.groupby('elements').agg({'points':custom_mean})

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.5.4
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0

Quite similar to issue #19806

@d-01
Copy link

d-01 commented May 28, 2020

Quick workaround

One possible alternative to df.groupby('elements').mean(skipna = False):

>>> df.replace(np.nan, np.inf).groupby('elements').mean().replace(np.inf, np.nan)

          points
elements        
144          1.0
145          1.0
166          NaN
214          1.5

Warning: Preexisted np.inf elements will be lost.

More solutions here: https://stackoverflow.com/q/54106112/

More examples related to this problem

groupby() + max() + skipna=False

skipna=False – no errors, just ignored:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).max(skipna=False)

A    11.0
B    22.0
dtype: float64

Expected output:

A    NaN
B    NaN
dtype: float64

groupby() + median() + skipna

skipna works as expected:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=False)

A    NaN
B    NaN
dtype: float64

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=True)

A    11.0
B    22.0
dtype: float64

@jbrockmendel jbrockmendel added Groupby Reduction Operations sum, mean, min, max, etc. labels Oct 12, 2020
@jorisvandenbossche
Copy link
Member

The skipna keyword is indeed not yet implemented for groupby reductions. An enhancement request to add this is covered in #15675, so going to close this issue as a duplicate.

@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Nov 20, 2020
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Nov 20, 2020
@jorisvandenbossche
Copy link
Member

Duplicate of #15675

@jorisvandenbossche jorisvandenbossche marked this as a duplicate of #15675 Nov 20, 2020
@evelynLorca
Copy link

I get a error with groupby.mean(skipna=True) .... said that group mean not have argument 'skipna'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Groupby Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

No branches or pull requests

5 participants