Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: problem with dt.hour/second/etc with timestamp [pyarrow] with fixed offset timezones #55322

Closed
2 of 3 tasks
daleschyov opened this issue Sep 29, 2023 · 3 comments
Closed
2 of 3 tasks
Labels
Arrow pyarrow functionality Bug Timezones Timezone data dtype Upstream issue Issue related to pandas dependency

Comments

@daleschyov
Copy link

daleschyov commented Sep 29, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


df = pd.DataFrame(
    [
        '2023-05-04 10:06:10+00:00',
        '2023-05-04 10:06:11+00:00',
        '2023-05-04 10:06:13+00:00',
        '2023-05-04 10:06:15+00:00',
    ],
    columns=['time'],
)

df['time'] = (
    df['time'].apply(pd.Timestamp).astype('timestamp[us, tz=UTC][pyarrow]')
)

df['time'] = df['time'].dt.tz_convert('UTC+09:00')

print(df['time'].dt.hour)

Issue Description

If we change timezone for pyarrow timestamp, methods like dt.hour, dt.second, dt.month raise error:
ArrowInvalid: Cannot locate timezone 'UTC+09:00': UTC+09:00 not found in timezone database
If you comment the line with tz_convert in the code above these methods will work ok.

Expected Behavior

Working methods dt.hour/second/etc with all time zones.

Installed Versions

INSTALLED VERSIONS

commit : e86ed37
python : 3.10.11.final.0
python-bits : 64
OS : Darwin
OS-release : 22.3.0
Version : Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.1
numpy : 1.24.4
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.7
jinja2 : 3.1.2
IPython : 8.12.2
pandas_datareader : None
bs4 : None
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.4.3
numba : 0.57.1
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2022.11.0
scipy : 1.10.1
sqlalchemy : 1.4.49
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@daleschyov daleschyov added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 29, 2023
@mroeschke
Copy link
Member

Thanks for the report. From the traceback, it doesn't seem like pyarrow supports fixed timezone offsets with their compute functions yet

@mroeschke mroeschke added Timezones Timezone data dtype Upstream issue Issue related to pandas dependency Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 29, 2023
@mroeschke mroeschke changed the title BUG: problem with dt.hour/second/etc with timestamp [pyarrow] with not UTC tz BUG: problem with dt.hour/second/etc with timestamp [pyarrow] with fixed offset timezones Sep 29, 2023
@Jiang15
Copy link

Jiang15 commented Oct 2, 2023

Hi, can I work on this? Could you assign the issue to me?

@mroeschke
Copy link
Member

Closing as an upstream pyarrow issue for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Timezones Timezone data dtype Upstream issue Issue related to pandas dependency
Projects
None yet
Development

No branches or pull requests

3 participants