Skip to content

Commit

Permalink
BUG: .rolling() returns incorrect values when ts index is not nano se…
Browse files Browse the repository at this point in the history
…conds (#55173)

* Fix rolling microseconds for sum

* - adding a test for rounding sum

* Update pandas/tests/window/test_rolling.py

Co-authored-by: Joris Van den Bossche <[email protected]>

* Update test_rolling.py

the df variable name fixed in the test

* Reverted version varialbe in doc/source/conf.py

* Units generalised to us/ms/s and test parameterised

* Rolling max tests added, related to #55026

* whatsnew note for 2.1.2 added

* Update doc/source/whatsnew/v2.1.2.rst

Co-authored-by: Matthew Roeschke <[email protected]>

* UTC timezone for _index_array of rolling

* Validating tz-aware data
Data conversion removed
Tests merged

* Conversion replaced by Timedelta.as_unit

* fixes for failing tests

* update whatsnew

* type checking

---------

Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Matthew Roeschke <[email protected]>
  • Loading branch information
3 people authored Oct 26, 2023
1 parent 074ab2f commit fe17818
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ including other versions of pandas.
Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :meth:`DataFrame.join` where result has missing values and dtype is arrow backed string (:issue:`55348`)
- Fixed regression in :meth:`~DataFrame.rolling` where non-nanosecond index or ``on`` column would produce incorrect results (:issue:`55026`, :issue:`55106`, :issue:`55299`)
- Fixed regression in :meth:`DataFrame.resample` which was extrapolating back to ``origin`` when ``origin`` was outside its bounds (:issue:`55064`)
- Fixed regression in :meth:`DataFrame.sort_index` which was not sorting correctly when the index was a sliced :class:`MultiIndex` (:issue:`55379`)
- Fixed regression in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg` where if the option ``compute.use_numba`` was set to True, groupby methods not supported by the numba engine would raise a ``TypeError`` (:issue:`55520`)
Expand Down
10 changes: 9 additions & 1 deletion pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

from pandas._libs.tslibs import (
BaseOffset,
Timedelta,
to_offset,
)
import pandas._libs.window.aggregations as window_aggregations
Expand Down Expand Up @@ -112,6 +113,8 @@
from pandas.core.generic import NDFrame
from pandas.core.groupby.ops import BaseGrouper

from pandas.core.arrays.datetimelike import dtype_to_unit


class BaseWindow(SelectionMixin):
"""Provides utilities for performing windowing operations."""
Expand Down Expand Up @@ -1887,7 +1890,12 @@ def _validate(self):
self._on.freq.nanos / self._on.freq.n
)
else:
self._win_freq_i8 = freq.nanos
try:
unit = dtype_to_unit(self._on.dtype) # type: ignore[arg-type]
except TypeError:
# if not a datetime dtype, eg for empty dataframes
unit = "ns"
self._win_freq_i8 = Timedelta(freq.nanos).as_unit(unit)._value

# min_periods must be an integer
if self.min_periods is None:
Expand Down
32 changes: 32 additions & 0 deletions pandas/tests/window/test_rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -1950,3 +1950,35 @@ def test_numeric_only_corr_cov_series(kernel, use_arg, numeric_only, dtype):
op2 = getattr(rolling2, kernel)
expected = op2(*arg2, numeric_only=numeric_only)
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize("unit", ["s", "ms", "us", "ns"])
@pytest.mark.parametrize("tz", [None, "UTC", "Europe/Prague"])
def test_rolling_timedelta_window_non_nanoseconds(unit, tz):
# Test Sum, GH#55106
df_time = DataFrame(
{"A": range(5)}, index=date_range("2013-01-01", freq="1s", periods=5, tz=tz)
)
sum_in_nanosecs = df_time.rolling("1s").sum()
# microseconds / milliseconds should not break the correct rolling
df_time.index = df_time.index.as_unit(unit)
sum_in_microsecs = df_time.rolling("1s").sum()
sum_in_microsecs.index = sum_in_microsecs.index.as_unit("ns")
tm.assert_frame_equal(sum_in_nanosecs, sum_in_microsecs)

# Test max, GH#55026
ref_dates = date_range("2023-01-01", "2023-01-10", unit="ns", tz=tz)
ref_series = Series(0, index=ref_dates)
ref_series.iloc[0] = 1
ref_max_series = ref_series.rolling(Timedelta(days=4)).max()

dates = date_range("2023-01-01", "2023-01-10", unit=unit, tz=tz)
series = Series(0, index=dates)
series.iloc[0] = 1
max_series = series.rolling(Timedelta(days=4)).max()

ref_df = DataFrame(ref_max_series)
df = DataFrame(max_series)
df.index = df.index.as_unit("ns")

tm.assert_frame_equal(ref_df, df)

0 comments on commit fe17818

Please sign in to comment.