-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Automatic data alignment fails in the transform
method of GroupBy
objects
#41550
Comments
Hi, thanks for your report. I am not sure, if transform should align the output. Did you find any reference about this in the docs? For example, what would you expect if your function would change the index to a RangeIndex for example? |
Good point - I don't know what I would expect. But is that an intended use of True, I have found no explicit reference that |
transform expects a reduction operation - when it's not i don't know if there are any expectations |
Don't you mean |
transform now aligns to the input object and produces the expected output. However unlike the expected in the OP, the output is a float because of the division. I think this should be expected. However, I think that transform aligns should be added to the docs. |
take |
Of course, the floating point output is reasonable for the particular example in the initial post. For me, |
What does this need on the docs side? :) I'm happy to make the change |
@TheMellyBee - looks like this is already in the docstring of transform and the User Guide:
I don't think there is anything further to do. Closing. |
I have checked that this issue has not already been reported. (... or at least I couldn't find an issue like this)
I have confirmed this bug exists on the latest version of pandas. (which is 1.2.4 at the moment)
(optional) I have confirmed this bug exists on the master branch of pandas.
Example data and intended output
Let's create some synthetic data to illustrate the issue:
Suppose we want to add some money to the poorest households. We have 600$ per district to distribute to households and we decide that this money will be divided equally among all households in a district that have income 1000$ or below. So, our expected result is:
Code sample and actual output
We will try to achieve the described goal by using
groupby
followed bytransform
:After executing the above code
df
looks like this:which doesn't match the expected output given above.
Problem description
The output obtained with the sample code doesn't match the expected output. It's unclear to me which of the assumptions specified in the documentation of
SeriesGrupBy.transform
are violated by the above code. I think that none, so I figure it's a bug.To gain some insight into the roots of the problem, note that the series returned by
add_income_to_poor_households
are ordered differently than the original ones. Please execute this on the originaldf
(before changing it via usingtransform
):Now
households
looks like this:But after calling
add_income_to_poor_households
the order of the index changes:So possibly the
transform
method fails to align the output of the inner functionadd_income_to_poor_households
to the original indices of the data frame subject to grouping.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-73-generic
Version : #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : pl_PL.UTF-8
LOCALE : pl_PL.UTF-8
pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 54.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: