Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add case_when method #56059

Merged
merged 43 commits into from
Jan 9, 2024
Merged

ENH: Add case_when method #56059

merged 43 commits into from
Jan 9, 2024

Conversation

samukweku
Copy link
Contributor

Continues the work started by @ELHoussineT(#50343)

  • uses pd.Series.mask under the hood
  • function implementation, as well as a Series method

@samukweku
Copy link
Contributor Author

pre-commit.ci autofix

@samukweku
Copy link
Contributor Author

@rhshadrach had issues with the previous pr #55390 and created this one. If there is a way to reconnect to the old one, kindly advise me on the approach. Thanks

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation is looking pretty good - but tests can be expanded on. Would it be okay if I pushed some tests to your branch?

pandas/core/case_when.py Outdated Show resolved Hide resolved
pandas/core/case_when.py Outdated Show resolved Hide resolved
pandas/core/case_when.py Outdated Show resolved Hide resolved
pandas/tests/test_case_when.py Outdated Show resolved Hide resolved
@samukweku samukweku requested a review from rhshadrach November 23, 2023 03:21
@samukweku
Copy link
Contributor Author

@rhshadrach made changes based on your feedback. Looking forward to your reviews

@rhshadrach
Copy link
Member

Thanks for the changes. I think this is looking good - but want to add more tests when I get a chance. In the meantime, cc @pandas-dev/pandas-core for any thoughts/feedback here.

pandas/core/case_when.py Outdated Show resolved Hide resolved
pandas/core/case_when.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a comment with respect to typing, and also another check to validate arguments.

@jbrockmendel
Copy link
Member

Thoughts:

  1. @samukweku will you have bandwidth/willingness to be responsible for maintenance of this going forward?
  2. id prefer either a top-level function or a method rather than both
  3. in the same direction as Irv: strong-as-possible typing will make me minimally cranky

@samukweku
Copy link
Contributor Author

@jbrockmendel in response to your comments:

  1. yes i am willing to be responsible for the maintenance going forward.
  2. i'd lean more in favour of a method - it would certainly make the code simpler, since it would be a method on a Series. if the team agrees on this, that would be great. The reason behind providing a function was to provide more flexibility - the conversation gives an idea of where the function could be helpful, but generally a method would be my preference.
  3. made some changes based on @Dr-Irv feedback. Thanks @Dr-Irv

pandas/core/case_when.py Outdated Show resolved Hide resolved
@samukweku samukweku requested a review from Dr-Irv November 30, 2023 22:20
pandas/core/case_when.py Outdated Show resolved Hide resolved
@samukweku samukweku requested review from mroeschke and Dr-Irv December 1, 2023 06:41
Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the typing changes and changes to the API, but I will let other maintainers take care of the rest of this one.

@phofl
Copy link
Member

phofl commented Dec 2, 2023

Strong preference for method

updated_replacements.append(replacement)
replacements = updated_replacements
if (default is not None) and isinstance(default, ABCSeries):
default = default.astype(common_dtypes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @MarcoGorelli

This might upcast, thoughts related to PDEP6?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is related to PDEP6, we are creating a new Series and not doing something like setitem.

cgobat and others added 7 commits December 28, 2023 11:19
…ails (#56589)

Add McKinney (2010) reference, DOI, team website, & more keywords
* REF: fix can_use_libjoin check

* DOC: docstring for can_use_libjoin

* Make can_use_libjoin checks more-correct

* avoid allocating mapping in monotonic cases

* fix categorical memory usage tests

* catch decimal.InvalidOperation

---------

Co-authored-by: Luke Manley <[email protected]>
* TYP: fix some annotations

* pyupgrade
* BUG: Series.to_numpy raising for arrow floats to numpy floats

* Fixup
@samukweku samukweku requested a review from mroeschke December 28, 2023 00:37
@jbrockmendel
Copy link
Member

How does this handle attrs/_metadata/flags when given mismatched args with mismatched attrs/_metadat/flags?

@samukweku
Copy link
Contributor Author

How does this handle attrs/_metadata/flags when given mismatched args with mismatched attrs/_metadat/flags?

@jbrockmendel i am unfamiliar with your question. Kindly share an example to help me understand.

@phofl
Copy link
Member

phofl commented Dec 28, 2023

@samukweku your latest rebase threw something off here, the diff is too large. Could you take a look?

@phofl
Copy link
Member

phofl commented Dec 28, 2023

@jbrockmendel

The handling off metadata is dispatched to mask in this case, so we should be good

@samukweku
Copy link
Contributor Author

I think the implementation is looking pretty good - but tests can be expanded on. Would it be okay if I pushed some tests to your branch?

@rhshadrach yes pls feel free to add more tests

@phofl
Copy link
Member

phofl commented Jan 9, 2024

merging this, we can always add tests after if necessary

@phofl phofl merged commit e3a55a4 into pandas-dev:main Jan 9, 2024
50 checks passed
@phofl
Copy link
Member

phofl commented Jan 9, 2024

thx for staying with us @samukweku

Copy link

lumberbot-app bot commented Jan 9, 2024

Something went wrong ... Please have a look at my logs.

phofl pushed a commit to phofl/pandas that referenced this pull request Jan 9, 2024
phofl added a commit that referenced this pull request Jan 9, 2024
ENH: Add case_when method (#56059)

(cherry picked from commit e3a55a4)

Co-authored-by: Samuel Oranyeli <[email protected]>
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Series Series data structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Dedicated method for creating conditional columns
9 participants