-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change handling of copy=None defaults for Pandas 2 #28523
Conversation
R: @tvalentyn |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov Report
@@ Coverage Diff @@
## master #28523 +/- ##
==========================================
- Coverage 72.24% 72.23% -0.01%
==========================================
Files 684 684
Lines 100952 100982 +30
==========================================
+ Hits 72929 72948 +19
- Misses 26447 26458 +11
Partials 1576 1576
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 14 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Are some tests failing on Pandas2 if we don't set it I think I understand the idea of preserving the behavior but also concerned since this will be a difference in defaults b/w Beam and Pandas. Also, we could call this out in https://beam.apache.org/documentation/dsls/dataframes/differences-from-pandas/ . |
So the default in 2 changes to None, and None isnt valid so things break.
I can make it so that populate defaults uses the copy_on_write for that
setting like Pandas uses instead of just true? That is more in line as the
same API.
It'll still be in populate_defaults though.
…On Tue, Sep 19, 2023, 6:56 PM tvalentyn ***@***.***> wrote:
Are some tests failing on Pandas2 if we don't set it copy=True?
I think I understand the idea of preserving the behavior but also
concerned since this will be a difference in defaults b/w Beam and Pandas.
Also, we could call this out in
https://beam.apache.org/documentation/dsls/dataframes/differences-from-pandas/
.
—
Reply to this email directly, view it on GitHub
<#28523 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABG4SHYCRPVXBGYR7OODLM3X3IPJRANCNFSM6AAAAAA46AGAKE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Looking at it again, we don't support copy=False anywhere in Beam. Because it requires memory sharing semantics we can't support. I don't know if that's worth calling out now because it is no different than ever before. If you ever specify copy=False currently it will either raise an error or ignore it and copy anyway depending on the function. Therefore the current PR behavior of setting the default to True makes sense because False never will. It is just before we relied on the default in pandas being True where it is now None. |
SGTM, thanks, is this ready to merge or you plan any other changes? |
Ready to merge |
In Pandas 1, the
copy
arg always had a default ofTrue
, while in Pandas 2 the new copy-on-write mechanism means thatcopy
defaults toNone
, which indicates "use the global copy_on_write setting". For Beam, copy-on-write should always be true, so just fill in the None defaults with True.Umbrella issue: #27221