-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: pandas.Timestamp.round infer correct timezone around DST #37592
Comments
This is expected as the |
@mroeschke I don't fully understand why the error is expected (a colleague also ran into this, and I couldn't explain why it would be ambigious). So I certainly understand that the tz-naive rounded timestamp is ambiguous:
So the rounded timestamp of "2020-11-01 01:00:00" without timezone is clearly ambiguous.
And of course there are two hours with "01:00:00", but since the original timestamp was "01:00:01.167000-0400", we could know that the correct result is "01:00:00-04:00" and not "01:00:00-05:00" ? But so the reason that |
@mroeschke |
@jorisvandenbossche So I made an assertion that we probably couldn't convert to UTC first before rounding in #22560 (comment), and I believe the reason is that some UTC offsets are not one full hour. e.g. for UTC +10:30 (https://en.wikipedia.org/wiki/List_of_UTC_time_offsets#UTC+10:30,_K%E2%80%A0) and a
And generally the recommendation to round via wall time was suggested in #22560 (comment), though that comment does also talk about rounding via absolute time when the frequency is an hour or less (like in this example) I do agree with both of your sentiments that choosing the |
Thanks for those pointers! Certainly understand now that it is indeed not possible to generally use the underlying UTC values but requiring the conversion to wall time. I have the feeling it would still be nice to support some common cases where it is possible though. Because in general timezones with hourly offsets are more common, and I think rounding to hour / minute / second also seem typical usecases of the |
Yeah time zones are definitely subject to change so detecting when to correctly round via wall time with hour or less frequencies might get tricky. I'll reopen to garner more feedback as a new enhancement proposal to round with wall-time-and-timezone in these cases, though I'll state my -0 position as the logic will get more complex and explicit > implicit though at the cost of convenience. |
Stumbled upon this today. Reading the comments here, I now understand what happens inside (conversion to naive local -> round -> localization to aware fails), but I was indeed surprised that pandas would treat my aware datetime as ambiguous, as said in #37592 (comment). Looks like I get it working with timestamp.round(freq, ambiguous=timestamp.fold) Are there cases where this would fail? If not, is there a downside to doing it internally? |
I don't think it's possible to do this in such a way as to avoid surprises completely. I can think of two ways to do the truncation: A. unlocalize, truncate, localize Let's consider the examples:
Using logic A. in example 1. would look like:
and in example 2.:
Using logic B. in example 1. would look like:
and in example 2.:
To summarise:
So, not really sure where to go from here, other than to suggest users to use the 'ambiguous' argument |
This doesn't work with Pandas 2.0. I changed to timestamp.round(freq, ambiguous=not bool(timestamp.fold)) Dates are only ambiguous on fall DST where
|
Revisiting this several months later - I don't think I see downsides to doing
, as suggested by @lafrech . Maybe we could just do that, and document it? |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
pandas.Timestamp.round method crashes.
Expected Output
Output of
pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit : 67a3d42
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-52-generic
Version : #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.1.4
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 50.3.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: