-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Make performance warnings opt-in and more noisy #55385
Comments
Nice idea. Would we have separate options for each type of warning? |
What do you mean by each type here? I'm proposing to only doing this for PerformanceWarning. |
I think this applicable to other sorts of warning too, to be honest. In my own application I have tests that fail on warnings and do not want to pass warnings through to users for backend code. So I end up implementing it both ways, for example: if version.parse(pd.__version__) >= version.parse("2.1.0"):
# applymap issues a deprecation warning with version <2.1.0
# TODO (low): clean this up when setting a minimum pandas version at 2.1.0
df[df["dtype"] == "bool"] = df[df["dtype"] == "bool"].map(_map_true_false)
else:
df[df["dtype"] == "bool"] = df[df["dtype"] == "bool"].applymap(_map_true_false)
return df Probably there are better ways to handle this but sometimes the most obvious patches are more efficient to implement (timewise). |
I think I'd be against enabling users to disable FutureWarnings via an option. That sounds like a footgun. |
Consider the 3 examples that you list above. You could have a separate option for turning on/off the |
It sounds like you have a use case in mind that is at odds with what I'm proposing here, but I'm not sure what that use case is. To be sure, what I'm proposing is that users don't typically run with PerformanceWarning enabled - both in the development of new code and in repeated runs of existing. Rather, a user has a piece of code they want to make more performant, they enable the PerformanceWarnings to see where pandas makes suggestions, modify the code, and then disable PerformanceWarnings as they run it in the future. I'm curious to learn more about the use case you have in mind where (I think) every different warning has an option to control it. |
Let's say I only want to find where I can remove |
I think users could accomplish this by grepping the output. But for both maintainers and users in the scenario I proposed, it seems like a headache to have a plethora of different options. |
That makes sense. |
To have an idea about the current warnings we might be talking about, a list of PerformanceWarnings I found with a quick search:
I think I am +1 on the proposal. In any case it would be nice to have an easier way to turn them off. |
I would also add all pandas methods that have a pure Python implementation that is known to be slow. Even if there is no alternative in pandas, there might be an alternative with an optimized implementation in another library. Getting this warning will at least give a hint where my code could be slow. Here are other suggestions. I do not claim they must lead to a PerformanceWarning, but that it is practical performance issues in my code that I spent time finding and fixing. Having a suggestion (with a PerformanceWarning) would have saved me a lot of time.
|
@arnaudlegout These examples aren't black and white, the array calls will be slower depending on what dtypes you have, for example the last one would trigger a copy if your dataframe has multiple dtypes and you convert to a numpy array |
@phofl right. I did not intend to say they are clear perf warning candidates. When you implement a method, you know you have certain code paths that are suboptimal, in that case, it could be nice to add a PerformanceWarning to make clear to the user you are in a slow code path execution. As another perf warning candidate, there is a groupby on a non-sorted index vs. a sorted index. |
@arnaudlegout if you have a specific example that shows this, I think it would be worth opening an issue for this. Because to me it seems the purpose of |
I opened issue #55972 |
Just throwing it out there, detecting testing time, e.g.:
Would be a wonderful time to turn on these warnings. Second, the regex capturing groups warning counts because it compiles a regex pattern every call to IMHO it could be completely removed. The issue is here: #56798 and my PR to delete it didn't get merged but it's an easy fix. |
At times, I think our performance warnings can be too noisy. For example, if I'm doing an ad hoc analysis on a small data set, I don't care about performance. Other times, I purposefully have a MultiIndex with a particular order because it's important to the analysis I'm doing, and working with the data produces lexsort warnings.
One idea is to make performance warnings opt-in (via an option) and more noisy. The idea here would be that a user has written a piece of code they're happy with, and then they can enable performance warnings to see if pandas thinks they're being ill-performant. Doing this, we can start emitting more warnings to try to nudge users to more performant options.
Some cases where we could warn:
.copy()
with Copy-on-Write enabled (ref: PDEP-7: Consistent copy/view semantics in pandas with Copy-on-Write #51463 (comment))cc @Dr-Irv @phofl @jbrockmendel @jorisvandenbossche
The text was updated successfully, but these errors were encountered: