-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sum of pd.DataFrame.groupby.sum containing NaN should return NaN ? #15674
Comments
if you really really want this behavior of
or
The passthru |
xref #15675 |
consider the following
Should a database entry be missing for some aggregated index value, the final figure should be returned as Either way, I agree with your assessment that it is inconsistent with current implementation as |
@flipdazed pandas propogates NaN values on purpose. Generally on aggregations you want to skip them. If you don't there are many options (e.g. look at
you really think so? sounds like you don't handle missing data at all. |
Jeff, there are certainly cases imaginable where you don't want to ignore missing values. And therefore we have that as a keyword. |
sure, but the vast vast majority, you want to skipna. Its very uncommon in fact to assume ALL data is valid; that is my point. |
It would be nice to have a keyword and get those NAs back in this case: When I resample after a groupby i need and aggregating function. it would be nice not to get zeros when I increase the resolution so I can use .fillna(method='ffill'). |
Code Sample, a copy-pastable example if possible
Problem description
When a grouped dataframe contains a value of
np.NaN
the expected output is not aligned withnumpy.sum
orpandas.Series.sum
NaN
as is given by theskipna=False
flag forpd.Series.sum
and alsopd.DataFrame.sum
However, this behavior is not reflected in the
pandas.DataFrame.groupby
objectand cannot be forced by applying the
np.sum
method directlysee this StackOverflow post for a workaround
Expected Output
and
Output of
pd.show_versions()
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 32.3.1
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.4
lxml: 3.7.0
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.43.0
pandas_datareader: None
The text was updated successfully, but these errors were encountered: