Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: enforce deprecation of frequencies deprecated for offsets #57986

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Mar 24, 2024

xref #52064, #55792, #55553, #55496
Enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets. Now the aliases are case-sensitive.

P.S. Corrected a note in v3.0.0 related to PR #57627

@natmokval natmokval added Clean Frequency DateOffsets labels Mar 25, 2024
@natmokval natmokval marked this pull request as ready for review March 25, 2024 15:42
@natmokval natmokval requested a review from MarcoGorelli as a code owner March 25, 2024 15:42
@natmokval
Copy link
Contributor Author

I enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets, ci - green. @MarcoGorelli could you please take a look at this PR?

@MarcoGorelli
Copy link
Member

/preview

Copy link
Contributor

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57986/

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating!

I've left a comment, but also, in general, this still looks very complex...if we're enforcing deprecations, then this might be a good chance to simplify the logic here?

OK with adding complexity to give a good error message if someone passes freq='M' instead of 'ME', as that's probably still fairly common, but periods are far less used

The code is currently very hard to read - which is OK as a temporary phase during which we're enforcing a deprecation - but ultimately the goal should be to end up something that's cleaner than it was when we started. Is that possible here?

pandas/_libs/tslibs/offsets.pyx Outdated Show resolved Hide resolved
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating! A few things

  • good to have the c_PERIOD_TO_OFFSET_FREQSTR dict, but I don't see why it's being used here:
    if not is_period:
        if name.upper() in c_OFFSET_REMOVED_FREQSTR:
            raise ValueError(
                f"\'{name}\' is no longer supported for offsets. Please "
                f"use \'{c_OFFSET_REMOVED_FREQSTR.get(name.upper())}\' "
                f"instead."
            )
        # below we raise for lowrecase monthly and bigger frequencies
        if (name.upper() != name and
                name.lower() not in {"h", "min", "s", "ms", "us", "ns"} and
                name.upper() not in c_PERIOD_TO_OFFSET_FREQSTR and
                name.upper() in c_OFFSET_TO_PERIOD_FREQSTR):
            raise ValueError(INVALID_FREQ_ERR_MSG.format(name))
    If you've already checked if not is_period, when why do you need to check if it's in c_PERIOD_TO_OFFSET_FREQSTR ?
  • lowrecase typo
  • is this part temporary
      elif name in {"d", "b"}:
        name = name.upper()
    elif (name.upper() not in {"B", "D"} and
            not name.upper().startswith("W")):
    ?
    If so, could you add a comment explaining why it needs to be there, possibly linking to an open issue? Ideally we should get to the point where we can get rid of all this complexity, so let's make it clear what the road towards that endpoint is

@natmokval
Copy link
Contributor Author

natmokval commented Apr 12, 2024

Thanks for updating! A few things

* good to have the `c_PERIOD_TO_OFFSET_FREQSTR` dict, but I don't see why it's being used here:
  ```python
  if not is_period:
      if name.upper() in c_OFFSET_REMOVED_FREQSTR:
          raise ValueError(
              f"\'{name}\' is no longer supported for offsets. Please "
              f"use \'{c_OFFSET_REMOVED_FREQSTR.get(name.upper())}\' "
              f"instead."
          )
      # below we raise for lowrecase monthly and bigger frequencies
      if (name.upper() != name and
              name.lower() not in {"h", "min", "s", "ms", "us", "ns"} and
              name.upper() not in c_PERIOD_TO_OFFSET_FREQSTR and
              name.upper() in c_OFFSET_TO_PERIOD_FREQSTR):
          raise ValueError(INVALID_FREQ_ERR_MSG.format(name))
  ```

  If you've already checked `if not is_period`, when why do you need to check if it's in `c_PERIOD_TO_OFFSET_FREQSTR `?

we need the check if it isn't in c_PERIOD_TO_OFFSET_FREQSTR, because we did not deprecate lowercase frequencies "d", "b", "w", "weekday", "w-sun”, and so on. We don't want to raise a ValueError for these frequencies for both offsets and period. After deprecating these frequencies we can remove the check (only uppercase will be correct).

for example without this check test_reindex_axes in pandas/tests/frame/methods/test_reindex.py raises a ValueError for freq = "d"

@natmokval
Copy link
Contributor Author

natmokval commented Apr 12, 2024

* `lowrecase` typo 

thanks, I corrected the typo

@natmokval
Copy link
Contributor Author

* is this part temporary
  ```python
    elif name in {"d", "b"}:
      name = name.upper()
  elif (name.upper() not in {"B", "D"} and
          not name.upper().startswith("W")):
  ```
  If so, could you add a comment explaining why it needs to be there, possibly linking to an open issue? Ideally we should get to the point where we can get rid of all this complexity, so let's make it clear what the road towards that endpoint is

Yes, it's the temporary part. I left the comment below.

@MarcoGorelli
Copy link
Member

thanks for explaining - is there a way to do that part without using c_PERIOD_TO_OFFSET_FREQSTR? let's try to separate them out a bit more, if c_PERIOD_TO_OFFSET_FREQSTR is for mapping period aliases to offset aliases, then we probably shouldn't be using it for offset aliases

@natmokval
Copy link
Contributor Author

thanks for explaining - is there a way to do that part without using c_PERIOD_TO_OFFSET_FREQSTR? let's try to separate them out a bit more, if c_PERIOD_TO_OFFSET_FREQSTR is for mapping period aliases to offset aliases, then we probably shouldn't be using it for offset aliases

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.?
I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness. Oops, I forgot to add "WEEKDAY" in c_PERIOD_TO_OFFSET_FREQSTR, should I add it?

After enforcing the deprecation of "d", "b", "w", "weekday", "w-sun”, etc. we can make simplifications.

@MarcoGorelli
Copy link
Member

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.?
I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness.

I'd suggest either that, or to add a set which contains aliases which are valid for both

@natmokval
Copy link
Contributor Author

natmokval commented Apr 12, 2024

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.?
I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness.

I'd suggest either that, or to add a set which contains aliases which are valid for both

thanks, then maybe we can leave it as it is?

@MarcoGorelli
Copy link
Member

cool, I think this is on the right track

I think there's a logic error somewhere, as it currently gives

In [7]: pd.period_range('2000', periods=3, freq='s')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File offsets.pyx:4853, in pandas._libs.tslibs.offsets.to_offset()

ValueError: 's' is not supported as period frequency.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 pd.period_range('2000', periods=3, freq='s')

File ~/pandas-dev/pandas/core/indexes/period.py:585, in period_range(start, end, periods, freq, name)
    582 if freq is None and (not isinstance(start, Period) and not isinstance(end, Period)):
    583     freq = "D"
--> 585 data, freq = PeriodArray._generate_range(start, end, periods, freq)
    586 dtype = PeriodDtype(freq)
    587 data = PeriodArray(data, dtype=dtype)

File ~/pandas-dev/pandas/core/arrays/period.py:321, in PeriodArray._generate_range(cls, start, end, periods, freq)
    318 periods = dtl.validate_periods(periods)
    320 if freq is not None:
--> 321     freq = Period._maybe_convert_freq(freq)
    323 if start is not None or end is not None:
    324     subarr, freq = _get_ordinal_range(start, end, periods, freq)

File period.pyx:1768, in pandas._libs.tslibs.period._Period._maybe_convert_freq()

File offsets.pyx:4914, in pandas._libs.tslibs.offsets.to_offset()

ValueError: Invalid frequency: s, failed to parse with error message: ValueError("'s' is not supported as period frequency.")

but 's' should definitely be supported here, right?

pandas/_libs/tslibs/offsets.pyx Outdated Show resolved Hide resolved
pandas/_libs/tslibs/offsets.pyx Outdated Show resolved Hide resolved
pandas/_libs/tslibs/dtypes.pyx Outdated Show resolved Hide resolved
pandas/_libs/tslibs/offsets.pyx Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Member

Thanks for updating, looking better - still got a comment though, else dicts' responsibilities are being mixed

I like the good error messages you're giving here. Perhaps pd.date_range('2000', periods=2, freq='T') should also give a good error message, advising to use 'min'? (as a separate pr, though)

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for sticking with this, we might not be far off

pandas/_libs/tslibs/offsets.pyx Outdated Show resolved Hide resolved
pandas/_libs/tslibs/offsets.pyx Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Member

thanks! this is probably good, will do another pass over tomorrow / in the week

@natmokval
Copy link
Contributor Author

thanks! this is probably good, will do another pass over tomorrow / in the week

thank you for helping me with this PR!

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

some of the test may look redundant, but we've seen issues in the past here with unsupported aliases silently getting converted to the wrong one, and none of these tests look expensive to run, so IMO it's OK to have them

leaving open a bit in case anyone has objections

msg = f"'{freq[1:]}' is deprecated and will be removed in a "
f"future version. Please use '{freq.upper()[1:]}' instead."

with tm.assert_produces_warning(FutureWarning, match=msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this deprecation be enforced by 3.0 as well?

Copy link
Contributor Author

@natmokval natmokval Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if we should enforce the deprecation of lowercase 'w' by 3.0. Because we want to deprecate the lowercase 'd', 'b', and 'c' frequencies in favor of the uppercase 'D', 'B', and 'C' in 3.0, we can keep 'w' as deprecated along with 'd', 'b', and 'c'. I think it might improve code readability.

Do you think it would be better to keep 'w' as deprecated or remove 'w' and deprecate the others?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see. If the plan is to also deprecate the lower case aliases in the future we can keep this as is

# GH#54939
msg = "'w' is deprecated and will be removed in a future version"

with tm.assert_produces_warning(FutureWarning, match=msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this deprecation be enforced by 3.0 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same as the comment above.

@mroeschke mroeschke added this to the 3.0 milestone Jun 7, 2024
@mroeschke mroeschke merged commit c95716a into pandas-dev:main Jun 7, 2024
47 checks passed
@mroeschke
Copy link
Member

Thanks @natmokval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants