-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: Deprecate dtype inference on pandas objects #56244
Conversation
There was some weird additional inference going on in cat, which was uncovered by the deprecation, is fixed now but doesn't really need a whatsnew since we merged another fix today as well |
cc @mroeschke good to merge here? |
pandas/core/series.py
Outdated
"constructor will keep the original dtype in the future. " | ||
"Call ``infer_objects on the result", | ||
FutureWarning, | ||
stacklevel=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does find_stack_level
return the incorrect result here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice for internal debugging, but yeah I can use find_stack_level as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Co-authored-by: Matthew Roeschke <[email protected]>
Co-authored-by: Matthew Roeschke <[email protected]>
Co-authored-by: Matthew Roeschke <[email protected]>
pandas/core/frame.py
Outdated
"Dtype inference on a pandas object " | ||
"(Series, Index, ExtensionArray) is deprecated. The Index " | ||
"constructor will keep the original dtype in the future. " | ||
"Call `infer_objects` on the result", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to get the old behavior"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
if original_dtype is None and is_pandas_object and data_dtype == np.object_: | ||
if self.dtypes.iloc[0] != data_dtype: | ||
warnings.warn( | ||
"Dtype inference on a pandas object " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in the DataFrame constructor" somewhere in this sentence? I think the "The Index constructor" below is a copy/paste leftover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sorry, adjusted
Should this affect pd.array? |
Yes, but it already preserves the dtype, so we should be good there |
@@ -460,6 +460,7 @@ Other Deprecations | |||
- Deprecated behavior of :meth:`Index.insert` with an object-dtype index silently performing type inference on the result, explicitly call ``result.infer_objects(copy=False)`` for the old behavior instead (:issue:`51363`) | |||
- Deprecated casting non-datetimelike values (mainly strings) in :meth:`Series.isin` and :meth:`Index.isin` with ``datetime64``, ``timedelta64``, and :class:`PeriodDtype` dtypes (:issue:`53111`) | |||
- Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call ``result.infer_objects(copy=False)`` on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, use ``pd.set_option("future.no_silent_downcasting", True)`` (:issue:`53656`) | |||
- Deprecated dtype inference in :class:`Index`, :class:`Series` and :class:`DataFrame` constructors when giving a pandas input, call ``.infer_objects`` on the input to keep the current behavior (:issue:`56012`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the input -> on the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either way is totally fine
@@ -20,7 +20,7 @@ def test_between(self): | |||
tm.assert_series_equal(result, expected) | |||
|
|||
def test_between_datetime_object_dtype(self): | |||
ser = Series(bdate_range("1/1/2000", periods=20).astype(object)) | |||
ser = Series(bdate_range("1/1/2000", periods=20).astype(object), dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .astype(object) here is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, changed
) | ||
with tm.assert_produces_warning(FutureWarning, match="Dtype inference"): | ||
ser = Series( | ||
period_range("20130101", periods=5, freq="D", name="xxx").astype(object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you construct the index outside the context so it is obvious where the warning comes from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This is very weird to begin with and becomes a real PITA when we infer strings as arrow backed strings