Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: passing mixed offsets with utc=False into to_datetime #54014

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Jul 5, 2023

xref #50887

parsing datetimes with mixed time zones will raise a warning if utc=False. We advice users to use utc=True to silence this warning.

@natmokval natmokval marked this pull request as ready for review July 8, 2023 20:06
@natmokval natmokval requested a review from WillAyd as a code owner July 8, 2023 20:06
@natmokval
Copy link
Contributor Author

@MarcoGorelli, could you please take a look at this pr?

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating! I haven't finished reviewing all the tests, but here's some comments in the meantime

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved
pandas/core/tools/datetimes.py Outdated Show resolved Hide resolved
pandas/core/tools/datetimes.py Outdated Show resolved Hide resolved
pandas/tests/tools/test_to_datetime.py Outdated Show resolved Hide resolved
@MarcoGorelli MarcoGorelli added this to the 2.1 milestone Jul 10, 2023
@MarcoGorelli MarcoGorelli added Datetime Datetime data dtype Deprecate Functionality to remove in pandas labels Jul 10, 2023
pandas/tests/tools/test_to_datetime.py Outdated Show resolved Hide resolved
pandas/tests/tools/test_to_datetime.py Show resolved Hide resolved
pandas/tests/tools/test_to_datetime.py Outdated Show resolved Hide resolved
pandas/tests/tools/test_to_datetime.py Outdated Show resolved Hide resolved
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really close. I just have a minor comment and then am ready to approve

pandas/tests/tools/test_to_datetime.py Outdated Show resolved Hide resolved
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good to me on green

Going to leave this open a bit in case others have comments, but I think we can probably get this in for 2.1

".*parsing datetimes with mixed time zones will raise a warning",
category=FutureWarning,
)
result = tools.to_datetime(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a couple more usages of to_datetime in this function, I think we should ideally silence them too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I addedcatch_warnings()in two other places where we call to_datetime in this function.
I get failures in ci and I’m not sure if they are related to my changes. Locally all tests pass.

e.g the tests below failed in ci, but passed locally:

pandas/tests/arrays/floating/test_construction.py
pandas/tests/arrays/integer/test_repr.py
pandas/tests/arrays/floating/test_repr.py

we don't even call this function (converter()) while running these tests.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @natmokval

This is going to be a fun one to enforce, it will simplify so much

Leaving open a bit in case others have objections

Comment on lines 332 to 333
To create a ``Series`` with mixed offsets and ``object`` dtype, please use ``apply``
and ``datetime.datetime.strptime``.
Copy link
Member

@MarcoGorelli MarcoGorelli Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this sentence to the warning message itself, and here show an example?
like

In [2]: pd.Series(data).apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z'))
Out[2]:
0    2020-01-01 00:00:00+06:00
1    2020-01-01 00:00:00+01:00
dtype: object

(formtted appropriately)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I corrected the warning message and added the example how users can get the old behavior.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating, looks like this addresses the request!

over to Matt

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (though I am still lukewarm on the change)

@MarcoGorelli
Copy link
Member

Sure, thanks - I don't want to force this onto the project, I really want us all to be happy with what we're doing

Suggestion: we get this into 2.1, and then if there's complaints or issues we hadn't foreseen (or use cases for which the workaround suggested isn't enough), then we revert for 2.2 / 3.0? Getting it in for 2.1 would give more time to hear people's potential complaints. Happy to admit I've been wrong about this, and to revert course, if that happens

@mroeschke mroeschke merged commit 0e8c730 into pandas-dev:main Jul 27, 2023
@mroeschke
Copy link
Member

Great work @natmokval

I do like that this will not catch users by surprise with object dtype from to_datetime. Yeah let's see if we get feedback from users in 2.1

@natmokval
Copy link
Contributor Author

Great work @natmokval

I do like that this will not catch users by surprise with object dtype from to_datetime. Yeah let's see if we get feedback from users in 2.1

thank you. I hope users will find this change reasonable.

@natmokval
Copy link
Contributor Author

Thank you @MarcoGorelli for helping me with this PR.

Comment on lines +673 to +674
"In a future version of pandas, parsing datetimes with mixed time "
"zones will raise a warning unless `utc=True`. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message here says that this will raise a warning in the future. But is that indeed the intent, or should that be "error" instead?
(my understanding of the discussion was that it would error in the future)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, thanks - @natmokval fancy addressing this in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I replaced "warning" with "error" in warning message and made a new PR

@@ -348,8 +349,20 @@ def _return_parsed_timezone_results(
dta = dta.tz_localize("utc")
else:
dta = dta.tz_convert("utc")
else:
if not dta.isna().all():
non_na_timezones.add(zone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this might break if we ever had a tzaware datetime object with a dateutil/pytz tzinfo in the input array bc those tzinfos are not hashable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine:

In [2]: import pytz

In [3]: result = pd.to_datetime(
   ...:     [
   ...:         "2000-01-03 12:34:56.123456+01:00",
   ...:         datetime(2020, 1, 1, tzinfo=pytz.timezone('Asia/Kathmandu'))
   ...:     ],
   ...:     exact=False,
   ...: )
<ipython-input-3-36f1f20a96cd>:1: FutureWarning: In a future version of pandas, parsing datetimes with mixed time zones will raise an error unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour and silence this warning. To create a `Series` with mixed offsets and `object` dtype, please use `apply` and `datetime.datetime.strptime`
  result = pd.to_datetime(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, its dateutil tzs that aren't hashable. though your example still seems to work with one, so never mind

@phofl
Copy link
Member

phofl commented Nov 26, 2023

Why are we filtering warnings internally here? My understanding is that this will raise in the future, so we should give users the time to address things

cc @natmokval @MarcoGorelli

@MarcoGorelli
Copy link
Member

it's in a place where the user can't do anything about it, and where in the future it just won't get converted (the input will be preserved) as we'll no longer end up with an object Series of Timestamps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants