-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame.apply with empty dataframe applies the function #49200
Comments
|
take |
The way I see it, the current API sets expectations that we can identify reduction implicitly. To do that we need to get the information on the return type of the function. Currently we can only do it at run time, and this implies that the function argument to I can see how this can be overcome with some effort, open to discussion A simple workaround would be to pass |
I agree that the new behavior is worse, but I am not sure the silent "do nothing" behavior is great either. Technically there is no row to apply |
raising seems like a good behavior! |
To tackle something like this, I think we need to come to an agreement how pandas operates on empty objects across all the incarnations of apply (Series, DataFrame[axis=0/1], GroupBy, Resample, Window). Otherwise it feels like we are playing wackamole, with behavior changing on various objects but not working toward a consistent API. In short - I think this should be closed in favor of #47959 (which needs someone to champion). |
Ah didn't know about #47959, yeah agreed closing in favor of that issue |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Prior to version 1.5.0, applying a function to an empty frame didn't apply the function. After some experiments with pandas 1.4.4, I discovered that the function is actually applied with an empty series as input, and access of the series values raised an exception that is caught and silenced. From a user perspective, it seemed like the function wasn't applied at all.
In 1.5.0 and 1.5.1, the function applied to an empty dataframe is applied with a series of NaN.
I'm not sure if this is actually a bug, but it's a recent change in behavior. I don't think this new behavior is what any user would expect. To me, the intuitive behavior would be that the function isn't executed at all.
Expected Behavior
Installed Versions
INSTALLED VERSIONS
commit : 91111fd
python : 3.10.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-194-generic
Version : #205-Ubuntu SMP Fri Sep 16 19:49:27 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.1
numpy : 1.22.1
pytz : 2022.4
dateutil : 2.8.2
setuptools : 64.0.3
pip : 22.2.2
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : 1.3.0
zstandard : None
tzdata : None
The text was updated successfully, but these errors were encountered: