PDEP-6 (setitem casting): Clarify how do decide when a setitem would upcast (and thus error in the future) or not #55935
Labels
Indexing
Related to indexing on series/frames, not to indexes themselves
Breaking this off from discussion on setting a float scalar into a float32 Series (#55679 (comment)), because it's a more general issue that should be clarified.
PDEP 6 bans upcasting in setitem-like operations, and thus the simple first rule is that in such operation, the dtype never changes. But the second aspect is less clear: when do we actually coerce the value-being-set to the target dtype, or when do we decide that an upcast would be needed (and thus would raise an error in the future).
In hindsight, I think the PDEP should have been more explicit on this. Currently, the text says (https://pandas.pydata.org/pdeps/0006-ban-upcasting.html):
It essentially ties this decision to the current behaviour, but 1) the current behaviour is not always correct (or can be upcasting too liberally for a world where we would ban upcasting in setitem), and 2) it's also very strange to explain in the future when something will error or not (assume someone is using pandas 3.0 and wonders if a certain setitem operation will raise or not, the answer would be: "well, check if pandas 2.1 upcasted, then it will now raise", which is of course not a good answer for future users).
(I know it is of course a very logical way to phrase the impact of the change for current users)
So can we define a more general rule when the value is cast to the target dtype? (not depending on current behaviour details)
cc @MarcoGorelli
The text was updated successfully, but these errors were encountered: