DEPR offsets: rename 'M' to 'ME' (#52064)

* Frequency: raise warnings when using ‘M’ frequency * Frequency: raise warnings when using ‘M’ frequency II * remove is_period and change str representation for freq in Period [skip ci] * remove is_period and fix some tests [skip ci] * fix some tests * fix some tests II * fix tests in pandas/tests/indexes/period/ [skip ci] * fix tests in pandas/tests/indexes/period/ and correct timedeltas.pyx * update frequencies.py, resample.py, and fix some tests * modify pandas/tseries/frequencies.py * fix tests * fix tests II * fix tests III * rename 'M' to 'ME' in docs * rename 'M' to 'ME' in docs II * rename 'M' to 'ME' in docs III * rename 'M' to 'ME' in docs IV * rename 'M' to 'ME' in docs V * add is_period to to_offset I * add is_period to to_offset II * correct the definition of period_array(…) and fix 19 tests * add is_period to _parse_dtype_strict() and fix tests * add constant OFFSET_TO_PERIOD_FREQSTR to period.pyx and fix tests * correct definitions of extract_ordinals() and _round(), fix tests * add replacement ME to M in _require_matching_freq, _parsed_string_to_bounds, and fix tests * add the constant PERIOD_TO_OFFSET_FREQSTR to period.pyx, correct definition of _resolution_obj and fix tests * fix tests * add the conversion ME to M to _from_datetime64, period_index, raise_on_incompatible and fix tests * fix some tests with resample * correct definitions of to_period, freqstr and get_period_alias, fix tests for plotting * correct pre-commit failures * add key from Grouper to the constructor of TimeGrouper and fix tests * add to asfreq() from resampler the conversion ME to M, fix tests * fix tests for for PeriodIndex and base tests for resample * correct the constructor of TimeGrouper and fix tests for resample and plotting * correct the definition of use_dynamic_x() and fix tests for plotting * correct the definition of the method use_dynamic_x, fix tests * correct the definition of the asfreq for PeriodArray, _get_period_alias, and fix tests * correct documentation, fix tests * correct docs: rename ME to M for periods * add pytest.mark.xfail to test_to_timestamp_quarterly_bug * correct mypy error attr-defined * correct the definition of variables which convert M/ME to ME/M in dtypes.pyx, declare to_offset in offsets.pyi, fix mypy errors * created the c version for dicts which convert M/ME to ME/M and fix mypy errors * fix doc build error in 09_timeseries.rst and mypy error * correct the constructor of Period, fix mypy errors * replace in _attrname_to_abbrevs ME with M and correct the constructor of Period * add conversion ME/M to Period constructor, add conversion M/ME to maybe_resample and reverse changes in _attrname_to_abbrevs * correct dict “time rules”, correct the definition of _parsed_string_to_bounds, remove is_period from definition _parse_weekly_str and _parse_dtype_strict * remove the argument is_period from _parse_dtype_strict * add to is_subperiod, is_superperiod and _is_monthly both M and ME, correct definitions of _downsample and _maybe_cast_slice_bound * add dict ME to M to the definition of freqstr, constructor of Period and remove pytest.mark.xfail from test_round_trip_current * refactor freqstr, extract_ordinals, and _require_matching_freq for Period, asfreq for resample and _parsed_string_to_bounds for datetimes * refactor _resolution_obj in dtypes.pyx and freqstr in /indexes/datetimelike.py * define a new function freq_to_period_freqstr in dtypes to convert ME to M * refactor use_dynamic_x for plotting and to_period in arrays/datetimes.py * refactor def _check_plot_works in plotting and test_to_period in class TestDatetimeArray * refactor name method of PeriodDtype, refactor __arrow_array__ and add test for ValueError in test_period.py * in PeriodArray refactor _from_datetime64 and remove redundant if in asfreq, add test for ValueError in test_period_index.py and ignore mypy error * correct def _resolution_obj in DatetimeLikeArrayMixin, refactor def freqstr in PeriodArray and add tests ValueError for ME * correct def _resolution_obj in DatetimeLikeArrayMixin and def to_offset, refactor def freqstr in PeriodArray and add tests for ‘ValueError’ and 'UserWarning' * add tests for 'UserWarning' * refactor methods to_period in DatetimeArray, _from_datetime64 in PeriodArray, fix test in plotting * add freq_to_offset_freqstr to convert M to ME, refactor _resolution_obj, add tests for ‘ValueError’ and 'UserWarning' * fix pre-commit failures * correct the definition of to_period in DatetimeArray, refactor _check_plot_works, fix test_asfreq_2M * correct definitions of _resolution_obj in dtypes.pyx and in DatetimeLikeArrayMixin, _attrname_to_abbrevs and fix test_get_attrname_from_abbrev * correct def asfreq in PeriodArray, remove unused function freq_to_offset_freqstr, fix tests * roll back in test_fillna_period dtype Period[M] with capital P * refactor the function raise_on_incompatible * fix mypy error in pandas/core/arrays/period.py * fix ruff error in pandas/tests/arrays/period/test_constructors.py * remove ME from definitions of is_monthly, is_subperiod, correct _maybe_coerce_freq and test_period_ordinal_start_values * fix test_dti_to_period_2monthish * update whatsnew/v2.1.0.rst * add an example for old/new behavior in whatsnew/v2.1.0.rst * corrected typo * replace name of section Deprecations with Other Deprecations * remove ME form is_superperiod, refactored tests * correct a test * move some tests to a new place * correct def asfreq for resampling, refactor asfreq for Period, fix tests * correct tests * correct def _shift_with_freq and fix test for shift * correct docs for asfreq in PeriodArray * correct def _shift_with_freq * add ‘me’ to _dont_uppercase, correct _require_matching_freq, fix tests * minor corrections * correct whatsnew * correct an example in user_guide/reshaping.rst * fix tests for plotting * correct tests for plotting * remove from OFFSET_TO_PERIOD_FREQSTR deprecated freqstr, fix tests
pandas-dev · Sep 20, 2023 · a98be06 · a98be06
1 parent 61a6335
commit a98be06
Show file tree

Hide file tree

Showing 96 changed files with 726 additions and 397 deletions.
diff --git a/doc/source/getting_started/intro_tutorials/09_timeseries.rst b/doc/source/getting_started/intro_tutorials/09_timeseries.rst
@@ -295,7 +295,7 @@ Aggregate the current hourly time series values to the monthly maximum value in
 
 .. ipython:: python
 
-    monthly_max = no_2.resample("M").max()
+    monthly_max = no_2.resample("ME").max()
     monthly_max
 
 A very powerful method on time series data with a datetime index, is the

diff --git a/doc/source/user_guide/cookbook.rst b/doc/source/user_guide/cookbook.rst
@@ -771,7 +771,7 @@ To create year and month cross tabulation:
 
    df = pd.DataFrame(
        {"value": np.random.randn(36)},
-       index=pd.date_range("2011-01-01", freq="M", periods=36),
+       index=pd.date_range("2011-01-01", freq="ME", periods=36),
    )
 
    pd.pivot_table(

diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
@@ -1416,7 +1416,7 @@ Groupby a specific column with the desired frequency. This is like resampling.
 
 .. ipython:: python
 
-   df.groupby([pd.Grouper(freq="1M", key="Date"), "Buyer"])[["Quantity"]].sum()
+   df.groupby([pd.Grouper(freq="1ME", key="Date"), "Buyer"])[["Quantity"]].sum()
 
 When ``freq`` is specified, the object returned by ``pd.Grouper`` will be an
 instance of ``pandas.api.typing.TimeGrouper``. You have an ambiguous specification
@@ -1426,9 +1426,9 @@ in that you have a named index and a column that could be potential groupers.
 
    df = df.set_index("Date")
    df["Date"] = df.index + pd.offsets.MonthEnd(2)
-   df.groupby([pd.Grouper(freq="6M", key="Date"), "Buyer"])[["Quantity"]].sum()
+   df.groupby([pd.Grouper(freq="6ME", key="Date"), "Buyer"])[["Quantity"]].sum()
 
-   df.groupby([pd.Grouper(freq="6M", level="Date"), "Buyer"])[["Quantity"]].sum()
+   df.groupby([pd.Grouper(freq="6ME", level="Date"), "Buyer"])[["Quantity"]].sum()
 
 
 Taking the first rows of each group

diff --git a/doc/source/user_guide/reshaping.rst b/doc/source/user_guide/reshaping.rst
@@ -136,7 +136,7 @@ Also, you can use :class:`Grouper` for ``index`` and ``columns`` keywords. For d
 
 .. ipython:: python
 
-   pd.pivot_table(df, values="D", index=pd.Grouper(freq="M", key="F"), columns="C")
+   pd.pivot_table(df, values="D", index=pd.Grouper(freq="ME", key="F"), columns="C")
 
 .. _reshaping.pivot.margins:
 

diff --git a/doc/source/user_guide/timeseries.rst b/doc/source/user_guide/timeseries.rst
@@ -107,7 +107,7 @@ data however will be stored as ``object`` data.
 
    pd.Series(pd.period_range("1/1/2011", freq="M", periods=3))
    pd.Series([pd.DateOffset(1), pd.DateOffset(2)])
-   pd.Series(pd.date_range("1/1/2011", freq="M", periods=3))
+   pd.Series(pd.date_range("1/1/2011", freq="ME", periods=3))
 
 Lastly, pandas represents null date times, time deltas, and time spans as ``NaT`` which
 is useful for representing missing or null date like values and behaves similar
@@ -450,7 +450,7 @@ variety of :ref:`frequency aliases <timeseries.offset_aliases>`:
 
 .. ipython:: python
 
-   pd.date_range(start, periods=1000, freq="M")
+   pd.date_range(start, periods=1000, freq="ME")
 
    pd.bdate_range(start, periods=250, freq="BQS")
 
@@ -882,7 +882,7 @@ into ``freq`` keyword arguments. The available date offsets and associated frequ
     :class:`~pandas.tseries.offsets.Week`, ``'W'``, "one week, optionally anchored on a day of the week"
     :class:`~pandas.tseries.offsets.WeekOfMonth`, ``'WOM'``, "the x-th day of the y-th week of each month"
     :class:`~pandas.tseries.offsets.LastWeekOfMonth`, ``'LWOM'``, "the x-th day of the last week of each month"
-    :class:`~pandas.tseries.offsets.MonthEnd`, ``'M'``, "calendar month end"
+    :class:`~pandas.tseries.offsets.MonthEnd`, ``'ME'``, "calendar month end"
     :class:`~pandas.tseries.offsets.MonthBegin`, ``'MS'``, "calendar month begin"
     :class:`~pandas.tseries.offsets.BMonthEnd` or :class:`~pandas.tseries.offsets.BusinessMonthEnd`, ``'BM'``, "business month end"
     :class:`~pandas.tseries.offsets.BMonthBegin` or :class:`~pandas.tseries.offsets.BusinessMonthBegin`, ``'BMS'``, "business month begin"
@@ -1246,7 +1246,7 @@ frequencies. We will refer to these aliases as *offset aliases*.
     "C", "custom business day frequency"
     "D", "calendar day frequency"
     "W", "weekly frequency"
-    "M", "month end frequency"
+    "ME", "month end frequency"
     "SM", "semi-month end frequency (15th and end of month)"
     "BM", "business month end frequency"
     "CBM", "custom business month end frequency"
@@ -1690,7 +1690,7 @@ the end of the interval.
 .. warning::
 
     The default values for ``label`` and ``closed`` is '**left**' for all
-    frequency offsets except for 'M', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W'
+    frequency offsets except for 'ME', 'A', 'Q', 'BM', 'BA', 'BQ', and 'W'
     which all have a default of 'right'.
 
     This might unintendedly lead to looking ahead, where the value for a later
@@ -1856,15 +1856,15 @@ to resample based on datetimelike column in the frame, it can passed to the
        ),
    )
    df
-   df.resample("M", on="date")[["a"]].sum()
+   df.resample("ME", on="date")[["a"]].sum()
 
 Similarly, if you instead want to resample by a datetimelike
 level of ``MultiIndex``, its name or location can be passed to the
 ``level`` keyword.
 
 .. ipython:: python
 
-   df.resample("M", level="d")[["a"]].sum()
+   df.resample("ME", level="d")[["a"]].sum()
 
 .. _timeseries.iterating-label:
 
@@ -2137,7 +2137,7 @@ The ``period`` dtype can be used in ``.astype(...)``. It allows one to change th
    pi.astype("datetime64[ns]")
 
    # convert to PeriodIndex
-   dti = pd.date_range("2011-01-01", freq="M", periods=3)
+   dti = pd.date_range("2011-01-01", freq="ME", periods=3)
    dti
    dti.astype("period[M]")
 
@@ -2256,7 +2256,7 @@ and vice-versa using ``to_timestamp``:
 
 .. ipython:: python
 
-   rng = pd.date_range("1/1/2012", periods=5, freq="M")
+   rng = pd.date_range("1/1/2012", periods=5, freq="ME")
 
    ts = pd.Series(np.random.randn(len(rng)), index=rng)
 

diff --git a/doc/source/whatsnew/v0.14.0.rst b/doc/source/whatsnew/v0.14.0.rst
@@ -860,10 +860,20 @@ Enhancements
                     datetime.datetime(2013, 9, 5, 10, 0)]})
      df
 
-     df.pivot_table(values='Quantity',
-                    index=pd.Grouper(freq='M', key='Date'),
-                    columns=pd.Grouper(freq='M', key='PayDay'),
-                    aggfunc="sum")
+  .. code-block:: ipython
+
+     In [75]: df.pivot_table(values='Quantity',
+        ....:                index=pd.Grouper(freq='M', key='Date'),
+        ....:                columns=pd.Grouper(freq='M', key='PayDay'),
+        ....:                aggfunc="sum")
+     Out[75]:
+     PayDay      2013-09-30  2013-10-31  2013-11-30
+     Date
+     2013-09-30         NaN         3.0         NaN
+     2013-10-31         6.0         NaN         1.0
+     2013-11-30         NaN         9.0         NaN
+
+     [3 rows x 3 columns]
 
 - Arrays of strings can be wrapped to a specified width (``str.wrap``) (:issue:`6999`)
 - Add :meth:`~Series.nsmallest` and :meth:`Series.nlargest` methods to Series, See :ref:`the docs <basics.nsorted>` (:issue:`3960`)

diff --git a/doc/source/whatsnew/v0.18.0.rst b/doc/source/whatsnew/v0.18.0.rst
@@ -837,9 +837,24 @@ Previously
 
 New API
 
-.. ipython:: python
+.. code-block:: ipython
 
-   s.resample('M').ffill()
+   In [91]: s.resample('M').ffill()
+   Out[91]:
+   2010-03-31    0
+   2010-04-30    0
+   2010-05-31    0
+   2010-06-30    1
+   2010-07-31    1
+   2010-08-31    1
+   2010-09-30    2
+   2010-10-31    2
+   2010-11-30    2
+   2010-12-31    3
+   2011-01-31    3
+   2011-02-28    3
+   2011-03-31    4
+   Freq: M, Length: 13, dtype: int64
 
 .. note::
 

diff --git a/doc/source/whatsnew/v0.19.0.rst b/doc/source/whatsnew/v0.19.0.rst
@@ -498,8 +498,26 @@ Other enhancements
          ),
      )
      df
-     df.resample("M", on="date")[["a"]].sum()
-     df.resample("M", level="d")[["a"]].sum()
+
+  .. code-block:: ipython
+
+     In [74]: df.resample("M", on="date")[["a"]].sum()
+     Out[74]:
+                 a
+     date
+     2015-01-31  6
+     2015-02-28  4
+
+     [2 rows x 1 columns]
+
+     In [75]: df.resample("M", level="d")[["a"]].sum()
+     Out[75]:
+                 a
+     d
+     2015-01-31  6
+     2015-02-28  4
+
+     [2 rows x 1 columns]
 
 - The ``.get_credentials()`` method of ``GbqConnector`` can now first try to fetch `the application default credentials <https://developers.google.com/identity/protocols/application-default-credentials>`__. See the docs for more details (:issue:`13577`).
 - The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behavior remains to raising a ``NonExistentTimeError`` (:issue:`13057`)

diff --git a/doc/source/whatsnew/v2.0.0.rst b/doc/source/whatsnew/v2.0.0.rst
@@ -76,7 +76,7 @@ Below is a possibly non-exhaustive list of changes:
 
    .. ipython:: python
 
-       idx = pd.date_range(start='1/1/2018', periods=3, freq='M')
+       idx = pd.date_range(start='1/1/2018', periods=3, freq='ME')
        idx.array.year
        idx.year
 

diff --git a/doc/source/whatsnew/v2.2.0.rst b/doc/source/whatsnew/v2.2.0.rst
@@ -177,6 +177,31 @@ Other API changes
 
 Deprecations
 ~~~~~~~~~~~~
+
+Deprecate alias ``M`` in favour of ``ME`` for offsets
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The alias ``M`` is deprecated in favour of ``ME`` for offsets, please use ``ME`` for "month end" instead of ``M`` (:issue:`9586`)
+
+For example:
+
+*Previous behavior*:
+
+.. code-block:: ipython
+
+    In [7]: pd.date_range('2020-01-01', periods=3, freq='M')
+    Out [7]:
+    DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31'],
+                  dtype='datetime64[ns]', freq='M')
+
+*Future behavior*:
+
+.. ipython:: python
+
+    pd.date_range('2020-01-01', periods=3, freq='ME')
+
+Other Deprecations
+^^^^^^^^^^^^^^^^^^
 - Changed :meth:`Timedelta.resolution_string` to return ``min``, ``s``, ``ms``, ``us``, and ``ns`` instead of ``T``, ``S``, ``L``, ``U``, and ``N``, for compatibility with respective deprecations in frequency aliases (:issue:`52536`)
 - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_clipboard`. (:issue:`54229`)
 - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_csv` except ``path_or_buf``. (:issue:`54229`)

diff --git a/pandas/_libs/tslibs/dtypes.pxd b/pandas/_libs/tslibs/dtypes.pxd
@@ -11,6 +11,8 @@ cpdef int64_t periods_per_second(NPY_DATETIMEUNIT reso) except? -1
 cpdef NPY_DATETIMEUNIT get_supported_reso(NPY_DATETIMEUNIT reso)
 cpdef bint is_supported_unit(NPY_DATETIMEUNIT reso)
 
+cpdef freq_to_period_freqstr(freq_n, freq_name)
+cdef dict c_OFFSET_TO_PERIOD_FREQSTR
 cdef dict c_DEPR_ABBREVS
 cdef dict attrname_to_abbrevs
 cdef dict npy_unit_to_attrname

diff --git a/pandas/_libs/tslibs/dtypes.pyi b/pandas/_libs/tslibs/dtypes.pyi
@@ -6,6 +6,7 @@ from pandas._libs.tslibs.timedeltas import UnitChoices
 #  are imported in tests.
 _attrname_to_abbrevs: dict[str, str]
 _period_code_map: dict[str, int]
+OFFSET_TO_PERIOD_FREQSTR: dict[str, str]
 DEPR_ABBREVS: dict[str, UnitChoices]
 
 def periods_per_day(reso: int) -> int: ...
@@ -14,6 +15,7 @@ def is_supported_unit(reso: int) -> bool: ...
 def npy_unit_to_abbrev(reso: int) -> str: ...
 def get_supported_reso(reso: int) -> int: ...
 def abbrev_to_npy_unit(abbrev: str) -> int: ...
+def freq_to_period_freqstr(freq_n: int, freq_name: str) -> str: ...
 
 class PeriodDtypeBase:
     _dtype_code: int  # PeriodDtypeCode

diff --git a/pandas/_libs/tslibs/dtypes.pyx b/pandas/_libs/tslibs/dtypes.pyx
@@ -13,7 +13,6 @@ from pandas._libs.tslibs.np_datetime cimport (
 
 import_pandas_datetime()
 
-
 cdef class PeriodDtypeBase:
     """
     Similar to an actual dtype, this contains all of the information
@@ -186,6 +185,45 @@ _attrname_to_abbrevs = {
 cdef dict attrname_to_abbrevs = _attrname_to_abbrevs
 cdef dict _abbrev_to_attrnames = {v: k for k, v in attrname_to_abbrevs.items()}
 
+OFFSET_TO_PERIOD_FREQSTR: dict = {
+    "WEEKDAY": "D",
+    "EOM": "M",
+    "BM": "M",
+    "BQS": "Q",
+    "QS": "Q",
+    "BQ": "Q",
+    "BA": "A",
+    "AS": "A",
+    "BAS": "A",
+    "MS": "M",
+    "D": "D",
+    "B": "B",
+    "min": "min",
+    "s": "s",
+    "ms": "ms",
+    "us": "us",
+    "ns": "ns",
+    "H": "H",
+    "Q": "Q",
+    "A": "A",
+    "W": "W",
+    "ME": "M",
+    "Y": "A",
+    "BY": "A",
+    "YS": "A",
+    "BYS": "A",
+}
+cdef dict c_OFFSET_TO_PERIOD_FREQSTR = OFFSET_TO_PERIOD_FREQSTR
+
+cpdef freq_to_period_freqstr(freq_n, freq_name):
+    if freq_n == 1:
+        freqstr = f"""{c_OFFSET_TO_PERIOD_FREQSTR.get(
+            freq_name, freq_name)}"""
+    else:
+        freqstr = f"""{freq_n}{c_OFFSET_TO_PERIOD_FREQSTR.get(
+            freq_name, freq_name)}"""
+    return freqstr
+
 # Map deprecated resolution abbreviations to correct resolution abbreviations
 DEPR_ABBREVS: dict[str, str]= {
     "T": "min",

diff --git a/pandas/_libs/tslibs/offsets.pxd b/pandas/_libs/tslibs/offsets.pxd
@@ -1,7 +1,7 @@
 from numpy cimport int64_t
 
 
-cpdef to_offset(object obj)
+cpdef to_offset(object obj, bint is_period=*)
 cdef bint is_offset_object(object obj)
 cdef bint is_tick_object(object obj)
 

diff --git a/pandas/_libs/tslibs/offsets.pyi b/pandas/_libs/tslibs/offsets.pyi
@@ -103,11 +103,11 @@ class SingleConstructorOffset(BaseOffset):
     def __reduce__(self): ...
 
 @overload
-def to_offset(freq: None) -> None: ...
+def to_offset(freq: None, is_period: bool = ...) -> None: ...
 @overload
-def to_offset(freq: _BaseOffsetT) -> _BaseOffsetT: ...
+def to_offset(freq: _BaseOffsetT, is_period: bool = ...) -> _BaseOffsetT: ...
 @overload
-def to_offset(freq: timedelta | str) -> BaseOffset: ...
+def to_offset(freq: timedelta | str, is_period: bool = ...) -> BaseOffset: ...
 
 class Tick(SingleConstructorOffset):
     _creso: int