BUG: groupby.sum
removed columns in case sequential calls of several groupby.sum
#44132
Closed
3 tasks done
Labels
Bug
Closing Candidate
May be closeable, needs more eyeballs
Groupby
Reduction Operations
sum, mean, min, max, etc.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
Script output
Concatenated dataframe from partial results: A C B 0.0 0.0 foo1 1.0 1.0 foo2 0.0 6.0 foo3foo5 1.0 3.0 foo4 Original dataframe: A C D B 0.0 0.0 foo1 2009-01-01 1.0 1.0 foo2 2009-01-02 0.0 2.0 foo3 2009-01-05 1.0 3.0 foo4 2009-01-06 0.0 4.0 foo5 2009-01-07 Concatenated df groupby.sum: A # Column `C` is missed B 0.0 6.0 1.0 4.0 Original df groupby.sum: A C B 0.0 6.0 foo1foo3foo5 1.0 4.0 foo2foo4
groupby.sum
misses columns in case of splitting DataFrame on several parts and applyinggroupby.sum
first for each part then for concatenation result of processed parts.One observation:
groupby.sum
changeddtypes
of partitions. If aligndtypes
of partitions withdf.dtypes
we can get expected behavior.Expected Behavior
Expected results:
Installed Versions
INSTALLED VERSIONS
commit : 73c6825
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-65-generic
Version : #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US
LOCALE : en_US.ISO8859-1
pandas : 1.3.3
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : 4.1.2
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.25.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 2021.06.1
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : 0.15.0
pyarrow : 4.0.1
pyxlsb : None
s3fs : 2021.06.1
scipy : 1.7.0
sqlalchemy : 1.4.20
tables : 3.6.1
tabulate : None
xarray : 0.18.2
xlrd : 2.0.1
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: