-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: numeric_only consistency #46560
Comments
@rhshadrach if you can update #30228 to account for these (there are a couple but not all) |
@jreback - added 46072 and 46852; those were the only two missing. |
@rhshadrach Thanks for all your work here! We're working on supporting this for Dask, and we just wanted to confirm if for the intermediate-term, i.e., from 1.5.0 to when We were looking at this changelog entry (also where we found this issue) and weren't sure about the defaults in 1.5.0. :) |
@pavithraes - the defaults in 1.5.0 should be backwards compatible with those in the rest of the 1.x releases. As remarked in the changelog you linked to, the defaults are currently inconsistent - some True, some False, some None. The only difference in 1.5.0 should be that users get a deprecation messages if (a) the defaults are used and (b) the result would change based on the new default (False) in 2.0. Any ops that gained a numeric_only argument as part of 1.5.0 will default to False. I hope this is helpful - if I can provide any specific information please let me know. Are there particular ops (e.g. sum, mean, std) on particular pandas objects (Series, DataFrame, SeriesGroupBy, DataFrameGroupBy, Resampler, Rolling) that you are interested in? Also, if there are any concerns or issues noticed, please let me know! |
@rhshadrach Got it, thank you for clarifying! Just one more quick question:
We noticed |
Thanks @pavithraes, indeed my comment above was incorrect. Each op has a default even if it doesn't have the argument; this is just it's behavior when no argument is provided. So to maintain backwards compatibility, we need to have the default agree with the behavior in the earlier 1.x releases. The table below compares all ops across DataFrame, DataFrameGroupBy, DatetimeIndexResampler, Rolling, Expanding and ExponentialMovingWindow. The results shown include only rows where "has_arg" or "default" differs between main and 1.4.x branches. It all looks correct except for two DatetimeIndexResampler ops. I plan to put up a PR to fix these.
|
The table below details the interaction between The table was generated by running ops like
The columns in the table below are as follows.
numeric_only and axis=1 table
The rows in the above table where issues are identified are:
I believe all rows above correspond to methods that gained the numeric_only operation as part of 1.5.0. The one exception to this is cc @mroeschke |
Good to see the most common ops were implemented correctly! |
#46560 (comment) previously reported rolling/expanding ops not correctly implementing
|
Tracking issue for API consistency between all the different ops that have a numeric_only argument. This includes the dtype of the return when object types are present. I plan to report on the current state after the issues listed below are closed and prior to the 1.5 release.
The text was updated successfully, but these errors were encountered: