Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate dtype inference on pandas objects #56244

Merged
merged 15 commits into from
Dec 21, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 29, 2023

This is very weird to begin with and becomes a real PITA when we infer strings as arrow backed strings

@phofl phofl added the Deprecate Functionality to remove in pandas label Nov 29, 2023
@phofl phofl added this to the 2.2 milestone Nov 29, 2023
@phofl phofl requested a review from mroeschke November 29, 2023 22:26
@phofl
Copy link
Member Author

phofl commented Dec 9, 2023

There was some weird additional inference going on in cat, which was uncovered by the deprecation, is fixed now but doesn't really need a whatsnew since we merged another fix today as well

@phofl
Copy link
Member Author

phofl commented Dec 15, 2023

cc @mroeschke

good to merge here?

pandas/core/frame.py Outdated Show resolved Hide resolved
pandas/core/series.py Outdated Show resolved Hide resolved
"constructor will keep the original dtype in the future. "
"Call ``infer_objects on the result",
FutureWarning,
stacklevel=2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does find_stack_level return the incorrect result here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice for internal debugging, but yeah I can use find_stack_level as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

phofl and others added 4 commits December 18, 2023 20:30
"Dtype inference on a pandas object "
"(Series, Index, ExtensionArray) is deprecated. The Index "
"constructor will keep the original dtype in the future. "
"Call `infer_objects` on the result",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to get the old behavior"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

if original_dtype is None and is_pandas_object and data_dtype == np.object_:
if self.dtypes.iloc[0] != data_dtype:
warnings.warn(
"Dtype inference on a pandas object "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in the DataFrame constructor" somewhere in this sentence? I think the "The Index constructor" below is a copy/paste leftover

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sorry, adjusted

@jbrockmendel
Copy link
Member

Should this affect pd.array?

@phofl
Copy link
Member Author

phofl commented Dec 21, 2023

Should this affect pd.array?

Yes, but it already preserves the dtype, so we should be good there

@@ -460,6 +460,7 @@ Other Deprecations
- Deprecated behavior of :meth:`Index.insert` with an object-dtype index silently performing type inference on the result, explicitly call ``result.infer_objects(copy=False)`` for the old behavior instead (:issue:`51363`)
- Deprecated casting non-datetimelike values (mainly strings) in :meth:`Series.isin` and :meth:`Index.isin` with ``datetime64``, ``timedelta64``, and :class:`PeriodDtype` dtypes (:issue:`53111`)
- Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call ``result.infer_objects(copy=False)`` on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, use ``pd.set_option("future.no_silent_downcasting", True)`` (:issue:`53656`)
- Deprecated dtype inference in :class:`Index`, :class:`Series` and :class:`DataFrame` constructors when giving a pandas input, call ``.infer_objects`` on the input to keep the current behavior (:issue:`56012`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the input -> on the result?

Copy link
Member Author

@phofl phofl Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either way is totally fine

@@ -20,7 +20,7 @@ def test_between(self):
tm.assert_series_equal(result, expected)

def test_between_datetime_object_dtype(self):
ser = Series(bdate_range("1/1/2000", periods=20).astype(object))
ser = Series(bdate_range("1/1/2000", periods=20).astype(object), dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .astype(object) here is redundant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, changed

)
with tm.assert_produces_warning(FutureWarning, match="Dtype inference"):
ser = Series(
period_range("20130101", periods=5, freq="D", name="xxx").astype(object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you construct the index outside the context so it is obvious where the warning comes from

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@phofl phofl merged commit 8864319 into pandas-dev:main Dec 21, 2023
45 checks passed
@phofl phofl deleted the dep_inference_pandas_obejct branch December 21, 2023 22:45
cbpygit pushed a commit to cbpygit/pandas that referenced this pull request Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: Series and Index shouldn't do inference on pandas objects
3 participants