-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow dictionaries to be passed to pandas.Series.str.replace #51748
Comments
take |
But at the expense of expanding the API. Is using a dict less performant than using regex? |
Hey @rhshadrach, It looks like using a dict indeed is a bit less performant than using regex for my example at least when I beef it up to 10M rows (shown in the screenshot below, I didn't test @rmhowe425's PR though I imagine it's similar). That said, while my original example can be solved with a regex since I'm mapping all "keys" to the same value, I've also had times where that wasn't the case and then I'd be forced to chain the regex (or loop over the items). It's not a hill I would I die on by any means but I could imagine others would run into this and benefit from it. But if it turns out I'm more alone in these use-cases than I thought then no worries 🙂. |
It seems to me that for loop should live in user code rather than within pandas. To me, reasons why it should live in pandas would be a non-straight forward implementation, performance improvements, or API consistency. I don't think any of those qualify here. |
Yeah that's reasonable. Although I would personally think this qualifies as API consistency since most similar functions in fact do accept dictionaries in this way (such as |
cc @mroeschke @phofl @jorisvandenbossche for any thoughts |
@rhshadrach Do we think we can close this issue? |
@rmhowe425 - I'm persuaded by this and am now leaning toward being +1. It does seem convenient from a user standpoint to be able to supply a dictionary. Would like to get some others thoughts. @mroeschke @MarcoGorelli @twoertwein |
This is kind of like the regular replace, so slight +1 on my side as well |
+/-0: I like the idea because it feels similar to |
to be honest I think looping over a dict is clear enough, and the timing difference in #51748 (comment) not really significant enough but, no strong opinion, I wouldn't block this if others wanted it |
@rhshadrach Do we think another PR can be opened for this issue? |
For many I think applying principle of least surprise here would lead one to conclude that the Python arguments are all accepted by pandas, but not necessarily the other way around. pandas is a specialized library, and so in general I think users won't be surprised that where there is overlap in methods names that pandas has more functionality. |
@rmhowe425 - I think a PR for this would be great. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I often want to replace various strings in a Series with an empty string and it would be nice to be able to do so in one call passing dictionary rather than chaining multiple
.str.replace
calls.Alternatively I can use the regex flag and use some OR logic there but I think it would be much cleaner and more consistent with other functions to be able to pass a dictionary in my opinion.
How it currently works
How I'd like for it work
Curious folks' thoughts, thanks y'all!!
Feature Description
A simple way to solve the problem would be inspect if a dictionary is passed to
str.replace
, and if so iterate over the items of the dictionary and simply call replace on each key/value pair.If this is an acceptable way to do it from the maintainers' perspectives then I'd happily submit such a PR.
Alternative Solutions
As mentioned above, both the following indeed currently work:
Additional Context
No response
The text was updated successfully, but these errors were encountered: