-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal fields dropped in group by with more than one column #22275
Comments
Tracing through, the 'a' column appears to be dropped by this section of
Specifically this line: pandas/pandas/core/groupby/groupby.py Line 4006 in 0409521
This suggests to me that there's a mismatch with the logic that sets |
There is no native support for Decimal types in NumPy and therefore it is also generally weakly supported in pandas. In theory a community-contributed Extension would help improve that. In the meantime though I do agree that it's strange that there is a difference between the Series and DataFrame GroupBy mechanism's. Investigation and PRs to align that behavior are certainly welcome |
My above comment appears to be correct, pandas/pandas/core/groupby/groupby.py Line 1466 in 0409521
Which can be seen by using a non-numeric_only group function such as min:
produces
Not knowing this api very closely I'm surprised that an exception isn't raised earlier in the process if one of the fields is non-numeric (and would be happy if it did so rather than silently dropping data). |
Another tactic would be to remove the I have tested this and it restores consistency between the two methods. Happy to PR if we can be confident this won't break other logic. |
If you have something that works and doesn't break other tests locally feel free to submit as a PR |
Unfortunately it causes a series of test failures, so the solution isn't quite so simple. |
This bug is similar to #21664 and I think the failing tests may be expecting |
As a side note, we actually have this within our test suite, albeit in a fledgling state: pandas/pandas/tests/extension/decimal/array.py Lines 37 to 118 in 0370740
This doesn't quite get the job done though, as there's currently no way to dynamically alter Will open a separate issue for the above though, as it's only tangentially related to this issue, and using |
I think this will just work after #22762 (if constructed as a |
Any fix for this? Just ran into this same problem. INSTALLED VERSIONScommit : None pandas : 0.25.1 |
This was resolved by #46560 |
Code Sample, a copy-pastable example if possible
Output:
Problem description
The aggregation over column a is dropped when another field is accessed from the groupby object, but works when requested through
agg
(I'm not sure if 'sum' is exactly equivalent to.sum()
in the above).Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 39.1.0
Cython: None
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.8.0
xarray: None
IPython: 5.7.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.4
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: