Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_datetime raises "AttributeError: 'NoneType' object has no attribute 'total_seconds'" even with errors='coerce' #59769

Open
2 of 3 tasks
enemyleft opened this issue Sep 10, 2024 · 5 comments
Labels
Bug Datetime Datetime data dtype

Comments

@enemyleft
Copy link

enemyleft commented Sep 10, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
# next line OK
date = pd.to_datetime("Wed, 1 Dec 2021 08:00:00 -0600 (CST)", errors='coerce', utc=True, format='mixed')
# next line raises Exception -> AttributeError: 'NoneType' object has no attribute 'total_seconds'
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format='mixed')

Issue Description

I have many dates to parse, some have a TimeZone like "(CET)", "(CST)" and much others, some not. The format is not predictable, so I cannot pass a predefined format string. The shown examples may be similar here, but this is not the case in real life. After some hours of analysis I finally found one specific date, which actually raises an exception.

Sun, 14 Apr 2024 20:00:00 +0200 (CET)

Expected Behavior

First I would expect that with errors='coerce' no error will be raised even the format is completely wrong, it should instead return "NaT", as the documentation suggests.

Second to me there is no "big" difference between the working date string Wed, 1 Dec 2021 08:00:00 -0600 (CST) and the one that raises an error Sun, 14 Apr 2024 20:00:00 +0200 (CET). I.e. both have the same format, the biggest difference is the TimeZone abbreviation, which is present in both cases, but different. In fact, if I omit (CET), the string can be parsed correctly.

As a workaround I could manually check whether a TimeZone abbreviation is present and remove it prior to call to_datetime. Especially when the time offset is present as well, this information is kind of redundant, i.e. it should not even be of interest for to_datetime. But this workaround should not be necessary in my opinion, as I think this is a bug and should be resolved in "to_datetime".

See also a similar issue here: #54479. Although I cannot reproduce this in my environment.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.12.5.final.0
python-bits : 64
OS : Linux
OS-release : 6.10.7-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Thu, 29 Aug 2024 16:48:57 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.2
numpy : 2.1.1
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : None
pip : 24.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None

@enemyleft enemyleft added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 10, 2024
@enemyleft enemyleft changed the title BUG: BUG: to_datetime raises "AttributeError: 'NoneType' object has no attribute 'total_seconds'" even with errors='coerce' Sep 10, 2024
@rhshadrach
Copy link
Member

Thanks for the report, I cannot reproduce on 64-bit linux, pandas 2.2.2 nor pandas 2.2.x, with the same versions of NumPy, pytz, and dateutil. Can you post a full stack trace of the error.

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue datetime.date stdlib datetime.date support Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member datetime.date stdlib datetime.date support labels Sep 15, 2024
@enemyleft
Copy link
Author

Thanks for the reply. I just reproduced it with a colleague, which is using MacOS and he also run into the same error. Here is the stack trace of MacOS:

test/import pandas as pd.py:10: FutureWarning: Parsing 'CET' as tzlocal (dependent on system timezone) is deprecated and will raise in a future version. Pass the 'tz' keyword or call tz_localize after construction instead
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format=None)
/Users/test/import pandas as pd.py:10: UserWarning: Could not infer format, so each element will be parsed individually, falling back to dateutil. To ensure parsing is consistent and as-expected, please specify a format.
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format=None)
Traceback (most recent call last):
File "/Users/test/import pandas as pd.py", line 10, in
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format=None)
File "/Users/test/.venv/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1099, in to_datetime
result = convert_listlike(argc, format)
File "/Users/test/.venv/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 435, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64(
File "/Users/test/.venv/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2398, in objects_to_datetime64
result, tz_parsed = tslib.array_to_datetime(
File "tslib.pyx", line 414, in pandas._libs.tslib.array_to_datetime
File "tslib.pyx", line 578, in pandas._libs.tslib.array_to_datetime
AttributeError: 'NoneType' object has no attribute 'total_seconds'

@rhshadrach
Copy link
Member

Thanks, from the Python docs the call to utcoffset here:

nsecs = tz.utcoffset(None).total_seconds()

can return None if the UTC offset isn't known. I'm not sure when this might happen, but it appears the logic should handle this case.

cc @jbrockmendel

@rhshadrach rhshadrach removed the Needs Info Clarification about behavior needed to assess issue label Sep 22, 2024
@jbrockmendel
Copy link
Member

what kind of tzinfo object are you getting back? might be fixable by passing an appropriate pydatetime object to utcoffset, but we wouldn't want to pay the cost of constructing that in the general case.

@HolzmanoLagrene
Copy link

I was doing some debugging and it seems that the problem only arises if my own timezone matches the timezone in the brackets! So for example: My own timezone is CET, CEST. Any string containing those strings in the brackets of the timezone name fails. If i switch my system time, everything works perfectly.

So if my local tz is ('CET', 'CEST') as of the output of print(time.tzname) the line

pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format='mixed')

fails whereas

pd.to_datetime("Wed, 1 Dec 2021 08:00:00 -0600 (CST)", errors='coerce', utc=True, format='mixed')

works fine.

Interestingly, if I switch my system time to CST, both lines work fine. The issue seems to be with CET?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

4 participants