Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change handling of copy=None defaults for Pandas 2 #28523

Merged
merged 2 commits into from
Sep 20, 2023

Conversation

caneff
Copy link
Contributor

@caneff caneff commented Sep 19, 2023

In Pandas 1, the copy arg always had a default of True, while in Pandas 2 the new copy-on-write mechanism means that copy defaults to None, which indicates "use the global copy_on_write setting". For Beam, copy-on-write should always be true, so just fill in the None defaults with True.

Umbrella issue: #27221

@caneff
Copy link
Contributor Author

caneff commented Sep 19, 2023

R: @tvalentyn

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Merging #28523 (ffa45c9) into master (08a9767) will decrease coverage by 0.01%.
Report is 41 commits behind head on master.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #28523      +/-   ##
==========================================
- Coverage   72.24%   72.23%   -0.01%     
==========================================
  Files         684      684              
  Lines      100952   100982      +30     
==========================================
+ Hits        72929    72948      +19     
- Misses      26447    26458      +11     
  Partials     1576     1576              
Flag Coverage Δ
python 82.82% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
sdks/python/apache_beam/dataframe/frame_base.py 89.52% <100.00%> (+0.06%) ⬆️

... and 14 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@tvalentyn
Copy link
Contributor

Are some tests failing on Pandas2 if we don't set it copy=True?

I think I understand the idea of preserving the behavior but also concerned since this will be a difference in defaults b/w Beam and Pandas.

Also, we could call this out in https://beam.apache.org/documentation/dsls/dataframes/differences-from-pandas/ .

@caneff
Copy link
Contributor Author

caneff commented Sep 19, 2023 via email

@caneff
Copy link
Contributor Author

caneff commented Sep 20, 2023

Are some tests failing on Pandas2 if we don't set it copy=True?

I think I understand the idea of preserving the behavior but also concerned since this will be a difference in defaults b/w Beam and Pandas.

Also, we could call this out in https://beam.apache.org/documentation/dsls/dataframes/differences-from-pandas/ .

Looking at it again, we don't support copy=False anywhere in Beam. Because it requires memory sharing semantics we can't support. I don't know if that's worth calling out now because it is no different than ever before. If you ever specify copy=False currently it will either raise an error or ignore it and copy anyway depending on the function.

Therefore the current PR behavior of setting the default to True makes sense because False never will. It is just before we relied on the default in pandas being True where it is now None.

@tvalentyn
Copy link
Contributor

SGTM, thanks, is this ready to merge or you plan any other changes?

@caneff
Copy link
Contributor Author

caneff commented Sep 20, 2023

SGTM, thanks, is this ready to merge or you plan any other changes?

Ready to merge

@caneff caneff requested a review from tvalentyn September 20, 2023 17:40
@tvalentyn tvalentyn merged commit 0b131c9 into apache:master Sep 20, 2023
77 checks passed
@caneff caneff deleted the copy_defaults branch September 21, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants