Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: passing mixed offsets with utc=False into to_datetime #54014

Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
bcbe2ac
add raising FutureWarning in _return_parsed_timezone_results and in a…
natmokval Jul 5, 2023
ba99e93
correct the definition of _return_parsed_timezone_results, added a te…
natmokval Jul 5, 2023
441e17f
fix tests in pandas/tests/extension/test_arrow.py
natmokval Jul 6, 2023
5e568bd
correct the definition of _return_parsed_timezone_results
natmokval Jul 7, 2023
2aa5c10
fix an exanple in docs: Parsing a CSV with mixed timezones
natmokval Jul 7, 2023
8197005
correct def _array_to_datetime_object, add a test for mixed format, f…
natmokval Jul 7, 2023
ca4b214
correct example in whatsnew/v0.24.0.rst and fix pylint failures
natmokval Jul 7, 2023
a0e970a
correct str for message in FutureWarning in the test with format mixed
natmokval Jul 7, 2023
156ea8a
fix an error in an example in whatsnew/v0.24.0.rst
natmokval Jul 7, 2023
643d3d6
correct examples and the description of the param utc in docstring o…
natmokval Jul 8, 2023
61d4deb
update whatsnew/v2.1.0.rst
natmokval Jul 9, 2023
f2bbedf
correct docstring for to_datetime, example in whatsnew/v0.24.0.rst, r…
natmokval Jul 10, 2023
5c4904a
refactor tests for to_datetime
natmokval Jul 13, 2023
f5e5e1e
refactor test for to_datetime
natmokval Jul 14, 2023
88f3845
add example to whatsnew/v2.1.0.rst
natmokval Jul 17, 2023
3d74972
correct the example
natmokval Jul 17, 2023
88ed6c1
correct the example in whatsnew/v2.1.0.rst
natmokval Jul 18, 2023
180ccce
Merge branch 'main' into DEPR-to_datetime-mixed-offsets-with-utc=False
natmokval Jul 19, 2023
6ecd997
correct def _array_to_datetime_object and fix test for read_json
natmokval Jul 20, 2023
dc7c54d
add catch_warnings to filter the warning in test_read_datetime
natmokval Jul 24, 2023
b5bbd2b
correct msg in catch_warnings to filter the warning in test_read_date…
natmokval Jul 24, 2023
1220130
Merge branch 'main' into DEPR-to_datetime-mixed-offsets-with-utc=False
natmokval Jul 25, 2023
0549e6d
Merge branch 'main' into DEPR-to_datetime-mixed-offsets-with-utc=False
natmokval Jul 25, 2023
b01c3ef
catch the warning in test_from_csv_with_mixed_offsets
natmokval Jul 25, 2023
04ef036
reword whatsnew
MarcoGorelli Jul 26, 2023
c42f143
add catch_warnings to converter
natmokval Jul 26, 2023
5e5adc6
Merge branch 'main' into DEPR-to_datetime-mixed-offsets-with-utc=False
natmokval Jul 26, 2023
3aa091f
Merge branch 'DEPR-to_datetime-mixed-offsets-with-utc=False' of https…
natmokval Jul 26, 2023
acee8de
describe how to maintain the old behavior
natmokval Jul 27, 2023
b7a2207
add an example how to get the old behavior and : correct the warning…
natmokval Jul 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -935,6 +935,8 @@ Parsing a CSV with mixed timezones
pandas cannot natively represent a column or index with mixed timezones. If your CSV
file contains columns with a mixture of timezones, the default result will be
an object-dtype column with strings, even with ``parse_dates``.
To parse the mixed-timezone values as a datetime column, read in as ``object`` dtype and
then call :func:`to_datetime` with ``utc=True``.


.. ipython:: python
Expand All @@ -943,14 +945,6 @@ an object-dtype column with strings, even with ``parse_dates``.
a
2000-01-01T00:00:00+05:00
2000-01-01T00:00:00+06:00"""
df = pd.read_csv(StringIO(content), parse_dates=["a"])
df["a"]

To parse the mixed-timezone values as a datetime column, read in as ``object`` dtype and
then call :func:`to_datetime` with ``utc=True``.

.. ipython:: python

df = pd.read_csv(StringIO(content))
df["a"] = pd.to_datetime(df["a"], utc=True)
df["a"]
Expand Down
21 changes: 14 additions & 7 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -632,13 +632,19 @@ Parsing datetime strings with the same UTC offset will preserve the UTC offset i
Parsing datetime strings with different UTC offsets will now create an Index of
``datetime.datetime`` objects with different UTC offsets

.. ipython:: python
.. code-block:: ipython

In [59]: idx = pd.to_datetime(["2015-11-18 15:30:00+05:30",
"2015-11-18 16:30:00+06:30"])

In[60]: idx
Out[60]: Index([2015-11-18 15:30:00+05:30, 2015-11-18 16:30:00+06:30], dtype='object')

In[61]: idx[0]
Out[61]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')

idx = pd.to_datetime(["2015-11-18 15:30:00+05:30",
"2015-11-18 16:30:00+06:30"])
idx
idx[0]
idx[1]
In[62]: idx[1]
Out[62]: Timestamp('2015-11-18 16:30:00+0630', tz='UTC+06:30')

Passing ``utc=True`` will mimic the previous behavior but will correctly indicate
that the dates have been converted to UTC
Expand Down Expand Up @@ -680,7 +686,8 @@ Parsing mixed-timezones with :func:`read_csv`
a
2000-01-01T00:00:00+05:00
2000-01-01T00:00:00+06:00"""
df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
df = pd.read_csv(StringIO(content))
df.a = pd.to_datetime(df['a'], utc=True)
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
df.a

As can be seen, the ``dtype`` is object; each value in the column is a string.
Expand Down
9 changes: 8 additions & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,14 @@ For example:
tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
"2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')

.. code-block:: ipython

In[37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
Out[37]:
Index([2010-01-01 12:00:00+01:00, 2010-01-01 12:00:00-01:00,
2010-01-01 12:00:00+03:00, 2010-01-01 12:00:00+04:00],
dtype='object')

.. _whatsnew_110.grouper_resample_origin:

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@ Deprecations
- Deprecated literal string/bytes input to :func:`read_html`. Wrap literal string/bytes input in ``io.StringIO`` / ``io.BytesIO`` instead. (:issue:`53767`)
- Deprecated option "mode.use_inf_as_na", convert inf entries to ``NaN`` before instead (:issue:`51684`)
- Deprecated parameter ``obj`` in :meth:`GroupBy.get_group` (:issue:`53545`)
- Deprecated parsing datetimes with mixed time zones unless user pass ``utc=True`` to :func:`to_datetime`, in a future version this will raise a warning and will advise to use ``utc=True`` (:issue:`50887`)
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
- Deprecated positional indexing on :class:`Series` with :meth:`Series.__getitem__` and :meth:`Series.__setitem__`, in a future version ``ser[item]`` will *always* interpret ``item`` as a label, not a position (:issue:`50617`)
- Deprecated strings ``T``, ``t``, ``L`` and ``l`` denoting units in :func:`to_timedelta` (:issue:`52536`)
- Deprecated the "method" and "limit" keywords on :meth:`Series.fillna`, :meth:`DataFrame.fillna`, :meth:`SeriesGroupBy.fillna`, :meth:`DataFrameGroupBy.fillna`, and :meth:`Resampler.fillna`, use ``obj.bfill()`` or ``obj.ffill()`` instead (:issue:`53394`)
Expand Down
11 changes: 11 additions & 0 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,7 @@ cdef _array_to_datetime_object(
# 1) NaT or NaT-like values
# 2) datetime strings, which we return as datetime.datetime
# 3) special strings - "now" & "today"
unique_timezones = set()
for i in range(n):
# Analogous to: val = values[i]
val = <object>(<PyObject**>cnp.PyArray_MultiIter_DATA(mi, 1))[0]
Expand Down Expand Up @@ -649,6 +650,7 @@ cdef _array_to_datetime_object(
tzinfo=tsobj.tzinfo,
fold=tsobj.fold,
)
unique_timezones.add(tsobj.tzinfo)

except (ValueError, OverflowError) as ex:
ex.args = (f"{ex}, at position {i}", )
Expand All @@ -666,6 +668,15 @@ cdef _array_to_datetime_object(

cnp.PyArray_MultiIter_NEXT(mi)

if len(unique_timezones) > 1:
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
warnings.warn(
"In a future version of pandas, parsing datetimes with mixed time "
"zones will raise a warning unless `utc=True`. "
Comment on lines +673 to +674
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message here says that this will raise a warning in the future. But is that indeed the intent, or should that be "error" instead?
(my understanding of the discussion was that it would error in the future)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, thanks - @natmokval fancy addressing this in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I replaced "warning" with "error" in warning message and made a new PR

"Please specify `utc=True` to opt in to the new behaviour "
"and silence this warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
return oresult_nd, None


Expand Down
43 changes: 36 additions & 7 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,8 @@ def _return_parsed_timezone_results(
tz_result : Index-like of parsed dates with timezone
"""
tz_results = np.empty(len(result), dtype=object)
unique(timezones)
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
non_na_timezones = set()
for zone in unique(timezones):
mask = timezones == zone
dta = DatetimeArray(result[mask]).tz_localize(zone)
Expand All @@ -345,8 +347,18 @@ def _return_parsed_timezone_results(
dta = dta.tz_localize("utc")
else:
dta = dta.tz_convert("utc")
else:
if not dta.isna().all():
non_na_timezones.add(zone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this might break if we ever had a tzaware datetime object with a dateutil/pytz tzinfo in the input array bc those tzinfos are not hashable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine:

In [2]: import pytz

In [3]: result = pd.to_datetime(
   ...:     [
   ...:         "2000-01-03 12:34:56.123456+01:00",
   ...:         datetime(2020, 1, 1, tzinfo=pytz.timezone('Asia/Kathmandu'))
   ...:     ],
   ...:     exact=False,
   ...: )
<ipython-input-3-36f1f20a96cd>:1: FutureWarning: In a future version of pandas, parsing datetimes with mixed time zones will raise an error unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour and silence this warning. To create a `Series` with mixed offsets and `object` dtype, please use `apply` and `datetime.datetime.strptime`
  result = pd.to_datetime(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, its dateutil tzs that aren't hashable. though your example still seems to work with one, so never mind

tz_results[mask] = dta

if len(non_na_timezones) > 1:
warnings.warn(
"In a future version of pandas, parsing datetimes with mixed time "
"zones will raise a warning unless `utc=True`. Please specify `utc=True` "
"to opt in to the new behaviour and silence this warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
return Index(tz_results, name=name)


Expand Down Expand Up @@ -749,6 +761,13 @@ def to_datetime(
offsets (typically, daylight savings), see :ref:`Examples
<to_datetime_tz_examples>` section for details.

.. warning::

In a future version of pandas, parsing datetimes with mixed time
zones will raise a warning unless `utc=True`.
Please specify `utc=True` to opt in to the new behaviour
and silence this warning.

See also: pandas general documentation about `timezone conversion and
localization
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
Expand Down Expand Up @@ -965,19 +984,29 @@ def to_datetime(

- However, timezone-aware inputs *with mixed time offsets* (for example
issued from a timezone with daylight savings, such as Europe/Paris)
are **not successfully converted** to a :class:`DatetimeIndex`. Instead a
simple :class:`Index` containing :class:`datetime.datetime` objects is
returned:

>>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100'])
are **not successfully converted** to a :class:`DatetimeIndex`.
Parsing datetimes with mixed time zones will raise a warning unless
`utc=True`. If you specify `utc=False` the warning below will be raised
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
and a simple :class:`Index` containing :class:`datetime.datetime`
objects will be returned:

>>> pd.to_datetime(['2020-10-25 02:00 +0200',
... '2020-10-25 04:00 +0100']) # doctest: +SKIP
FutureWarning: In a future version of pandas, parsing datetimes with mixed
time zones will raise a warning unless `utc=True`. Please specify `utc=True`
to opt in to the new behaviour and silence this warning.
Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00],
dtype='object')

- A mix of timezone-aware and timezone-naive inputs is also converted to
a simple :class:`Index` containing :class:`datetime.datetime` objects:

>>> from datetime import datetime
>>> pd.to_datetime(["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)])
>>> pd.to_datetime(["2020-01-01 01:00:00-01:00",
... datetime(2020, 1, 1, 3, 0)]) # doctest: +SKIP
FutureWarning: In a future version of pandas, parsing datetimes with mixed
time zones will raise a warning unless `utc=True`. Please specify `utc=True`
to opt in to the new behaviour and silence this warning.
Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')

|
Expand Down
4 changes: 3 additions & 1 deletion pandas/tests/indexes/datetimes/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,8 +300,10 @@ def test_construction_index_with_mixed_timezones(self):
assert not isinstance(result, DatetimeIndex)

msg = "DatetimeIndex has mixed timezones"
msg_depr = "parsing datetimes with mixed time zones will raise a warning"
with pytest.raises(TypeError, match=msg):
DatetimeIndex(["2013-11-02 22:00-05:00", "2013-11-03 22:00-06:00"])
with tm.assert_produces_warning(FutureWarning, match=msg_depr):
DatetimeIndex(["2013-11-02 22:00-05:00", "2013-11-03 22:00-06:00"])

# length = 1
result = Index([Timestamp("2011-01-01")], name="idx")
Expand Down
4 changes: 3 additions & 1 deletion pandas/tests/io/json/test_readlines.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ def test_read_datetime(request, engine):
if engine == "pyarrow":
result = read_json(StringIO(json_line), engine=engine)
else:
result = read_json(StringIO(json_line), engine=engine)
msg = "parsing datetimes with mixed time zones will raise a warning"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = read_json(StringIO(json_line), engine=engine)
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
expected = DataFrame(
[[1, "2020-03-05", "hector"], [2, "2020-04-08T09:58:49+00:00", "hector"]],
columns=["accounts", "date", "name"],
Expand Down
Loading