-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR offsets: rename 'M' to 'ME' #52064
DEPR offsets: rename 'M' to 'ME' #52064
Conversation
The Now we can apply both, “M” and “ME”. The example below raises one warning for freq=‘3M’
In order to reduce the number of test errors I also changed “M” to “ME” in pandas/tests/arrays/test_datetimelike.py:35
This reduced number of test errors from 1400 to 68. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for having started this!
So, this correctly throws a warning for date_range
and resample
with 'ME'
However, it also throws a warning for PeriodIndex(..., freq='M')
, which as per #9586 (comment) we need to keep working
So, we need to figure out how to pass 'M'
to PeriodIndex without it warning, but such that passing 'M'
to date_range
would warn
Maybe pandas/pandas/core/arrays/period.py Line 965 in 52c653b
And then, instead of warning if |
Thanks for helping me with the issue. I noticed the same inconsistency for Period. It also throws a warning for
I will work on fixing it. |
we might need to update pandas/pandas/_libs/tslibs/period.pyx Lines 2304 to 2307 in 072402b
so that, when it's printed, it still shows |
b07c569
to
e1d82cb
Compare
Looks like this one might need updating pandas/pandas/_libs/tslibs/dtypes.pyx Lines 143 to 155 in 8e2b974
|
Thank you, I updated pandas/pandas/_libs/tslibs/dtypes.pyx Line 104 in 8e2b974
and pandas/pandas/_libs/tslibs/dtypes.pyx Lines 306 to 325 in 8e2b974
and pandas/pandas/_libs/tslibs/dtypes.pyx Lines 342 to 345 in 8e2b974
as well and it fixed my tests. but I still have failures in some other tests and I keep working on fixing them. |
thanks @natmokval - looks like CI didn't run for some reason, could you fetch and merge upstream/main please so I can take a look at any test failures? or just let me know which tests are failing for you and I'll try running them |
Because I didn't want to run CI, I added '[skip ci]' to the commit message. Maybe it's better to run CI for every commit. I did fetch, merge main, and made a new commit. Could you please take a look at the tests: |
You'll need to update pandas/pandas/core/resample.py Line 1602 in 8e2b974
too For how I found this: the failing test has |
warnings.warn( | ||
"\'M\' will be deprecated, please use \'ME\' " | ||
"for \'month end\'", | ||
UserWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I'd missed this when reviewing, this should have been FutureWarning
This patch may have induced a performance regression. If it was a necessary behavior change, this may have been expected and everything is okay. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected for appear as subbullets.
Subsequent benchmarks may have skipped some commits. The link below lists the commits that are between the two benchmark runs where the regression was identified. |
@MarcoGorelli sorry for taking a while to get back on this — I have one more kind of example that seems to have been tripped up by this PR (note the presence of the Prior to this PR
On
|
Hey @spencerkclark thanks again for the report - you trying things out on So, let's have a look at what's going on. We've put
Looks like this matches the current output, and that the previous one was incorrect? Furthermore, Polars agrees with the current pandas output: In [31]: dfpl = pl.from_pandas(series.to_frame().reset_index().rename(columns={0: 'value'}))
In [32]: dfpl.sort('index').group_by_dynamic(
...: 'index',
...: every='2d',
...: offset=timedelta(seconds=1)-timedelta(days=1),
...: closed='right',
...: label='right').agg(pl.col('value'))
Out[32]:
shape: (5, 2)
┌─────────────────────┬───────────┐
│ index ┆ value │
│ --- ┆ --- │
│ datetime[ns] ┆ list[i64] │
╞═════════════════════╪═══════════╡
│ 2000-01-01 00:00:01 ┆ [0] │
│ 2000-01-07 00:00:01 ┆ [1] │
│ 2000-01-11 00:00:01 ┆ [2] │
│ 2000-01-17 00:00:01 ┆ [3] │
│ 2000-01-21 00:00:01 ┆ [4] │
└─────────────────────┴───────────┘ Anyway, I think was most likely fixed by #55283, not by this PR |
Ah this all makes sense now — I hadn't fully appreciated that #55283 also provided a bug fix and didn't think to question the original output. Thanks for that detailed explanation. Indeed #55283 is what I should be looking at and provides an illustration for how we should fix this in our implementation in xarray — I can confirm locally that the previously failing tests pass! |
@@ -2729,6 +2743,9 @@ def asfreq( | |||
if how is None: | |||
how = "E" | |||
|
|||
if isinstance(freq, BaseOffset): | |||
freq = freq_to_period_freqstr(freq.n, freq.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think if a user tries to use a period-invalid freq with a PeriodIndex we would want to raise/warn here wouldn't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it already raises doesn't it?
In [1]: In [10]: df = pd.DataFrame({'ts': pd.period_range('2000', periods=3), 'val': [1,2,3]})
...:
...: In [11]: df.set_index('ts').resample('MS').mean()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File period.pyx:2854, in pandas._libs.tslibs.period.freq_to_dtype_code()
AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
Cell In[1], line 3
1 df = pd.DataFrame({'ts': pd.period_range('2000', periods=3), 'val': [1,2,3]})
----> 3 df.set_index('ts').resample('MS').mean()
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/generic.py:9788, in NDFrame.resample(self, rule, axis, closed, label, convention, kind, on, level, origin, offset, group_keys)
9785 else:
9786 kind = None
-> 9788 return get_resampler(
9789 cast("Series | DataFrame", self),
9790 freq=rule,
9791 label=label,
9792 closed=closed,
9793 axis=axis,
9794 kind=kind,
9795 convention=convention,
9796 key=on,
9797 level=level,
9798 origin=origin,
9799 offset=offset,
9800 group_keys=group_keys,
9801 )
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/resample.py:2038, in get_resampler(obj, kind, **kwds)
2034 """
2035 Create a TimeGrouper and return our resampler.
2036 """
2037 tg = TimeGrouper(obj, **kwds) # type: ignore[arg-type]
-> 2038 return tg._get_resampler(obj, kind=kind)
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/resample.py:2227, in TimeGrouper._get_resampler(self, obj, kind)
2218 return DatetimeIndexResampler(
2219 obj,
2220 timegrouper=self,
(...)
2224 gpr_index=ax,
2225 )
2226 elif isinstance(ax, PeriodIndex) or kind == "period":
-> 2227 return PeriodIndexResampler(
2228 obj,
2229 timegrouper=self,
2230 kind=kind,
2231 axis=self.axis,
2232 group_keys=self.group_keys,
2233 gpr_index=ax,
2234 )
2235 elif isinstance(ax, TimedeltaIndex):
2236 return TimedeltaIndexResampler(
2237 obj,
2238 timegrouper=self,
(...)
2241 gpr_index=ax,
2242 )
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/resample.py:185, in Resampler.__init__(self, obj, timegrouper, axis, kind, gpr_index, group_keys, selection, include_groups)
180 self.include_groups = include_groups
182 self.obj, self.ax, self._indexer = self._timegrouper._set_grouper(
183 self._convert_obj(obj), sort=True, gpr_index=gpr_index
184 )
--> 185 self.binner, self.grouper = self._get_binner()
186 self._selection = selection
187 if self._timegrouper.key is not None:
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/resample.py:250, in Resampler._get_binner(self)
244 @final
245 def _get_binner(self):
246 """
247 Create the BinGrouper, assume that self.set_grouper(obj)
248 has already been called.
249 """
--> 250 binner, bins, binlabels = self._get_binner_for_time()
251 assert len(bins) == len(binlabels)
252 bin_grouper = BinGrouper(bins, binlabels, indexer=self._indexer)
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/resample.py:1884, in PeriodIndexResampler._get_binner_for_time(self)
1882 if self.kind == "timestamp":
1883 return super()._get_binner_for_time()
-> 1884 return self._timegrouper._get_period_bins(self.ax)
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/resample.py:2451, in TimeGrouper._get_period_bins(self, ax)
2447 return binner, bins, labels
2449 freq_mult = self.freq.n
-> 2451 start = ax.min().asfreq(self.freq, how=self.convention)
2452 end = ax.max().asfreq(self.freq, how="end")
2453 bin_shift = 0
File period.pyx:1945, in pandas._libs.tslibs.period._Period.asfreq()
File period.pyx:2856, in pandas._libs.tslibs.period.freq_to_dtype_code()
ValueError: Invalid frequency: <MonthBegin>
|
||
if isinstance(index, PeriodIndex): | ||
orig_freq = to_offset(index.freq) | ||
if freq != orig_freq: | ||
assert orig_freq is not None # for mypy | ||
raise ValueError( | ||
f"Given freq {freq.rule_code} does not match " | ||
f"PeriodIndex freq {orig_freq.rule_code}" | ||
f"Given freq {freq_to_period_freqstr(freq.n, freq.name)} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid the freq_to_period_freqstr by using is_period=True in to_offset a few lines up?
} | ||
cdef dict c_OFFSET_TO_PERIOD_FREQSTR = OFFSET_TO_PERIOD_FREQSTR | ||
|
||
cpdef freq_to_period_freqstr(freq_n, freq_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be viable to get away from this entirely and use PeriodDtype(Base)._freqstr?
The PRs pandas-dev#55792 (Y), pandas-dev#52064 (Q), and pandas-dev#55553 (M) deprecated the single letter version of the aliases in favour of the -end version of them. This table was missed as part of that work.
The PRs pandas-dev#55792 (Y), pandas-dev#52064 (Q), and pandas-dev#55553 (M) deprecated the single letter version of the aliases in favour of the -end version of them. Add a note to the offset table about deprecations.
This pr is related to issue #9586.
Added warning to to_offset(freq) in case freq='M'. This is the first step to deprecate 'M' for MonthEnd.
TODO: let 'M' keep working for Period
example to reproduce:
both
date_range
andresample
produce warnings.