mean with skipna either True or False on groupby gives error #32696

navankurverma · 2020-03-14T02:40:17Z

Problem description

Using .mean(skipna = True) or .mean(skipna = False) on a groupby object gives error:

UnsupportedFunctionCall: numpy operations are not valid with groupby. Use .groupby(...).mean() instead

skipna is a crucial parameter while doing analysis. By default skipna = True, so if not given explicitly results in expected output but if given explicitly either as True or False, returns error.

Code Sample:

import pandas as pd
df = pd.DataFrame({'elements':[144,214,166,166,145,144,214],
                  'points':[1,2,3,None,1,1,1]})

df.groupby('elements').mean() ###this works
df.groupby('elements').mean(skipna = True) ###this doesn't work
df.groupby('elements').mean(skipna = False) ###this doesn't work

Expected Output

with mean(skipna = True):

elements  points
144       1.0
145       1.0
166       3.0
214       1.5

with mean(skipna = False):

elements  points
144       1.0
145       1.0
166       NaN
214       1.5

Workaround I tried which gives expected output:

def custom_mean(df):
    return df.mean(skipna=False)
df.groupby('elements').agg({'points':custom_mean})

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.5.4
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0

Quite similar to issue #19806

The text was updated successfully, but these errors were encountered:

d-01 · 2020-05-28T20:32:09Z

Quick workaround

One possible alternative to df.groupby('elements').mean(skipna = False):

>>> df.replace(np.nan, np.inf).groupby('elements').mean().replace(np.inf, np.nan)

          points
elements        
144          1.0
145          1.0
166          NaN
214          1.5

Warning: Preexisted np.inf elements will be lost.

More solutions here: https://stackoverflow.com/q/54106112/

More examples related to this problem

`groupby()` + `max()` + `skipna=False`

skipna=False – no errors, just ignored:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).max(skipna=False)

A    11.0
B    22.0
dtype: float64

Expected output:

A    NaN
B    NaN
dtype: float64

`groupby()` + `median()` + `skipna`

skipna works as expected:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=False)

A    NaN
B    NaN
dtype: float64

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=True)

A    11.0
B    22.0
dtype: float64

jorisvandenbossche · 2020-11-20T13:07:33Z

The skipna keyword is indeed not yet implemented for groupby reductions. An enhancement request to add this is covered in #15675, so going to close this issue as a duplicate.

jorisvandenbossche · 2020-11-20T13:07:47Z

Duplicate of #15675

evelynLorca · 2023-11-29T17:03:09Z

I get a error with groupby.mean(skipna=True) .... said that group mean not have argument 'skipna'

jbrockmendel added Groupby Reduction Operations sum, mean, min, max, etc. labels Oct 12, 2020

jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Nov 20, 2020

jorisvandenbossche added this to the No action milestone Nov 20, 2020

jorisvandenbossche marked this as a duplicate of #15675 Nov 20, 2020

jorisvandenbossche closed this as completed Nov 20, 2020

malte-fu mentioned this issue Feb 27, 2024

BUG: .rolling-method is not working properly when called with a timedelta #57549

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mean with skipna either True or False on groupby gives error #32696

mean with skipna either True or False on groupby gives error #32696

navankurverma commented Mar 14, 2020

INSTALLED VERSIONS

d-01 commented May 28, 2020

jorisvandenbossche commented Nov 20, 2020

jorisvandenbossche commented Nov 20, 2020

evelynLorca commented Nov 29, 2023

mean with skipna either True or False on groupby gives error #32696

mean with skipna either True or False on groupby gives error #32696

Comments

navankurverma commented Mar 14, 2020

Problem description

Code Sample:

Expected Output

Workaround I tried which gives expected output:

Output of pd.show_versions()

INSTALLED VERSIONS

d-01 commented May 28, 2020

Quick workaround

More examples related to this problem

groupby() + max() + skipna=False

groupby() + median() + skipna

jorisvandenbossche commented Nov 20, 2020

jorisvandenbossche commented Nov 20, 2020

evelynLorca commented Nov 29, 2023

Output of `pd.show_versions()`

`groupby()` + `max()` + `skipna=False`

`groupby()` + `median()` + `skipna`