Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDEP-6 (setitem casting): Clarify how do decide when a setitem would upcast (and thus error in the future) or not #55935

Open
jorisvandenbossche opened this issue Nov 13, 2023 · 1 comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 13, 2023

Breaking this off from discussion on setting a float scalar into a float32 Series (#55679 (comment)), because it's a more general issue that should be clarified.

PDEP 6 bans upcasting in setitem-like operations, and thus the simple first rule is that in such operation, the dtype never changes. But the second aspect is less clear: when do we actually coerce the value-being-set to the target dtype, or when do we decide that an upcast would be needed (and thus would raise an error in the future).

In hindsight, I think the PDEP should have been more explicit on this. Currently, the text says (https://pandas.pydata.org/pdeps/0006-ban-upcasting.html):

  • If a setitem-like operation would previously have changed a Series' dtype, it would now raise.

It essentially ties this decision to the current behaviour, but 1) the current behaviour is not always correct (or can be upcasting too liberally for a world where we would ban upcasting in setitem), and 2) it's also very strange to explain in the future when something will error or not (assume someone is using pandas 3.0 and wonders if a certain setitem operation will raise or not, the answer would be: "well, check if pandas 2.1 upcasted, then it will now raise", which is of course not a good answer for future users).

(I know it is of course a very logical way to phrase the impact of the change for current users)

So can we define a more general rule when the value is cast to the target dtype? (not depending on current behaviour details)

cc @MarcoGorelli

@jorisvandenbossche
Copy link
Member Author

Personally, I think this is related to "safe" casting, i.e. #45588. That issue is about constructors and astype, but I think similar rules could be used for setitem: if the RHS value can be cast "safely" to the target dtype of the object we are setting into, then we do this cast. And otherwise we would have to upcast and thus warn / raise an error in the future.

Essentially when doing ser[0] = val, you can think of trying to do a constructor(val, dtype=ser.dtype) / val.astype(ser.dtype) with some level of casting safety, to coerce val to something that can be set.

For a large part we are already doing this (for example, casting a large int to int8 is not safe, and we currently also upcast for that + warn that we will error in the future. Or casting a round float to integer can be safe, so when setting that into an integer Series we don't warn, while casting a non-round float is not safe and setting does upcast + raises a warning). But also not in all cases (for example, casting a string representation of floats to float can generally be considered as "safe", but we do upcast when setting a string into a float series (ser[0] = "1.0")).

But of course when tying this to "safe" casting, we first have to decide for all the different cases and dtypes what we exactly consider as safe, because that is currently also not defined .. (except for being discussed in #45588)

@jorisvandenbossche jorisvandenbossche added the Indexing Related to indexing on series/frames, not to indexes themselves label Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

1 participant