Skip to content

Commit

Permalink
Merge branch 'main' into enh-stata-non-nano
Browse files Browse the repository at this point in the history
  • Loading branch information
jbrockmendel committed Oct 29, 2023
2 parents c50bc10 + adae693 commit 38fad02
Show file tree
Hide file tree
Showing 120 changed files with 3,213 additions and 2,530 deletions.
5 changes: 5 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,12 @@ jobs:
no_output_timeout: 30m # Sometimes the tests won't generate any output, make sure the job doesn't get killed by that
command: |
pip3 install cibuildwheel==2.15.0
# When this is a nightly wheel build, allow picking up NumPy 2.0 dev wheels:
if [[ "$IS_SCHEDULE_DISPATCH" == "true" || "$IS_PUSH" != 'true' ]]; then
export CIBW_ENVIRONMENT="PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"
fi
cibuildwheel --prerelease-pythons --output-dir wheelhouse
environment:
CIBW_BUILD: << parameters.cibw-build >>

Expand Down
15 changes: 14 additions & 1 deletion .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,14 +137,27 @@ jobs:
shell: bash -el {0}
run: echo "sdist_name=$(cd ./dist && ls -d */)" >> "$GITHUB_ENV"

- name: Build wheels
- name: Build normal wheels
if: ${{ (env.IS_SCHEDULE_DISPATCH != 'true' || env.IS_PUSH == 'true') }}
uses: pypa/[email protected]
with:
package-dir: ./dist/${{ matrix.buildplat[1] == 'macosx_*' && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
env:
CIBW_PRERELEASE_PYTHONS: True
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}

- name: Build nightly wheels (with NumPy pre-release)
if: ${{ (env.IS_SCHEDULE_DISPATCH == 'true' && env.IS_PUSH != 'true') }}
uses: pypa/[email protected]
with:
package-dir: ./dist/${{ matrix.buildplat[1] == 'macosx_*' && env.sdist_name || needs.build_sdist.outputs.sdist_file }}
env:
# The nightly wheels should be build witht he NumPy 2.0 pre-releases
# which requires the additional URL.
CIBW_ENVIRONMENT: PIP_EXTRA_INDEX_URL=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple
CIBW_PRERELEASE_PYTHONS: True
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}

- name: Set up Python
uses: mamba-org/setup-micromamba@v1
with:
Expand Down
4 changes: 4 additions & 0 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@
"ffill",
"first",
"head",
"idxmax",
"idxmin",
"last",
"median",
"nunique",
Expand Down Expand Up @@ -588,6 +590,8 @@ class GroupByCythonAgg:
"prod",
"min",
"max",
"idxmin",
"idxmax",
"mean",
"median",
"var",
Expand Down
3 changes: 2 additions & 1 deletion ci/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ echo PYTHONHASHSEED=$PYTHONHASHSEED

COVERAGE="-s --cov=pandas --cov-report=xml --cov-append --cov-config=pyproject.toml"

PYTEST_CMD="MESONPY_EDITABLE_VERBOSE=1 PYTHONDEVMODE=1 PYTHONWARNDEFAULTENCODING=1 pytest -r fEs -n $PYTEST_WORKERS --dist=loadfile $TEST_ARGS $COVERAGE $PYTEST_TARGET"
# TODO: Support NEP 50 and remove NPY_PROMOTION_STATE
PYTEST_CMD="NPY_PROMOTION_STATE=legacy MESONPY_EDITABLE_VERBOSE=1 PYTHONDEVMODE=1 PYTHONWARNDEFAULTENCODING=1 pytest -r fEs -n $PYTEST_WORKERS --dist=loadfile $TEST_ARGS $COVERAGE $PYTEST_TARGET"

if [[ "$PATTERN" ]]; then
PYTEST_CMD="$PYTEST_CMD -m \"$PATTERN\""
Expand Down
7 changes: 7 additions & 0 deletions doc/source/reference/general_functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,13 @@ Top-level evaluation

eval

Datetime formats
~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: api/

tseries.api.guess_datetime_format

Hashing
~~~~~~~
.. autosummary::
Expand Down
4 changes: 2 additions & 2 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -517,8 +517,8 @@ listed below, those with a ``*`` do *not* have a Cython-optimized implementation
:meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
:meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
:meth:`~.DataFrameGroupBy.first`;Compute the first occurring value in each group
:meth:`~.DataFrameGroupBy.idxmax` *;Compute the index of the maximum value in each group
:meth:`~.DataFrameGroupBy.idxmin` *;Compute the index of the minimum value in each group
:meth:`~.DataFrameGroupBy.idxmax`;Compute the index of the maximum value in each group
:meth:`~.DataFrameGroupBy.idxmin`;Compute the index of the minimum value in each group
:meth:`~.DataFrameGroupBy.last`;Compute the last occurring value in each group
:meth:`~.DataFrameGroupBy.max`;Compute the maximum value in each group
:meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Version 2.1
.. toctree::
:maxdepth: 2

v2.1.3
v2.1.2
v2.1.1
v2.1.0
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v2.1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,4 @@ Other
Contributors
~~~~~~~~~~~~

.. contributors:: v2.1.0..v2.1.1|HEAD
.. contributors:: v2.1.0..v2.1.1
19 changes: 18 additions & 1 deletion doc/source/whatsnew/v2.1.2.rst
Original file line number Diff line number Diff line change
@@ -1,30 +1,45 @@
.. _whatsnew_212:

What's new in 2.1.2 (October 25, 2023)
What's new in 2.1.2 (October 26, 2023)
---------------------------------------

These are the changes in pandas 2.1.2. See :ref:`release` for a full changelog
including other versions of pandas.

{{ header }}

.. ---------------------------------------------------------------------------
.. _whatsnew_212.deprecations:

Deprecations
~~~~~~~~~~~~

- Reverted deprecation of ``fill_method=None`` in :meth:`DataFrame.pct_change`, :meth:`Series.pct_change`, :meth:`DataFrameGroupBy.pct_change`, and :meth:`SeriesGroupBy.pct_change`; the values ``'backfill'``, ``'bfill'``, ``'pad'``, and ``'ffill'`` are still deprecated (:issue:`53491`)

.. ---------------------------------------------------------------------------
.. _whatsnew_212.regressions:

Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :meth:`DataFrame.join` where result has missing values and dtype is arrow backed string (:issue:`55348`)
- Fixed regression in :meth:`~DataFrame.rolling` where non-nanosecond index or ``on`` column would produce incorrect results (:issue:`55026`, :issue:`55106`, :issue:`55299`)
- Fixed regression in :meth:`DataFrame.resample` which was extrapolating back to ``origin`` when ``origin`` was outside its bounds (:issue:`55064`)
- Fixed regression in :meth:`DataFrame.sort_index` which was not sorting correctly when the index was a sliced :class:`MultiIndex` (:issue:`55379`)
- Fixed regression in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg` where if the option ``compute.use_numba`` was set to True, groupby methods not supported by the numba engine would raise a ``TypeError`` (:issue:`55520`)
- Fixed performance regression with wide DataFrames, typically involving methods where all columns were accessed individually (:issue:`55256`, :issue:`55245`)
- Fixed regression in :func:`merge_asof` raising ``TypeError`` for ``by`` with datetime and timedelta dtypes (:issue:`55453`)
- Fixed regression in :func:`read_parquet` when reading a file with a string column consisting of more than 2 GB of string data and using the ``"string"`` dtype (:issue:`55606`)
- Fixed regression in :meth:`DataFrame.to_sql` not roundtripping datetime columns correctly for sqlite when using ``detect_types`` (:issue:`55554`)
- Fixed regression in construction of certain DataFrame or Series subclasses (:issue:`54922`)

.. ---------------------------------------------------------------------------
.. _whatsnew_212.bug_fixes:

Bug fixes
~~~~~~~~~
- Fixed bug in :class:`.DataFrameGroupBy` reductions not preserving object dtype when ``infer_string`` is set (:issue:`55620`)
- Fixed bug in :meth:`.DataFrameGroupBy.min()` and :meth:`.DataFrameGroupBy.max()` not preserving extension dtype for empty object (:issue:`55619`)
- Fixed bug in :meth:`.SeriesGroupBy.value_counts` returning incorrect dtype for string columns (:issue:`55627`)
- Fixed bug in :meth:`Categorical.equals` if other has arrow backed string dtype (:issue:`55364`)
- Fixed bug in :meth:`DataFrame.__setitem__` not inferring string dtype for zero-dimensional array with ``infer_string=True`` (:issue:`55366`)
- Fixed bug in :meth:`DataFrame.idxmin` and :meth:`DataFrame.idxmax` raising for arrow dtypes (:issue:`55368`)
Expand Down Expand Up @@ -52,3 +67,5 @@ Other

Contributors
~~~~~~~~~~~~

.. contributors:: v2.1.1..v2.1.2
41 changes: 41 additions & 0 deletions doc/source/whatsnew/v2.1.3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
.. _whatsnew_213:

What's new in 2.1.3 (November ??, 2023)
---------------------------------------

These are the changes in pandas 2.1.3. See :ref:`release` for a full changelog
including other versions of pandas.

{{ header }}

.. ---------------------------------------------------------------------------
.. _whatsnew_213.regressions:

Fixed regressions
~~~~~~~~~~~~~~~~~
-
-

.. ---------------------------------------------------------------------------
.. _whatsnew_213.bug_fixes:

Bug fixes
~~~~~~~~~
-
-

.. ---------------------------------------------------------------------------
.. _whatsnew_213.other:

Other
~~~~~
-
-

.. ---------------------------------------------------------------------------
.. _whatsnew_213.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v2.1.2..v2.1.3|HEAD
12 changes: 9 additions & 3 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ Other enhancements
- :func:`read_csv` now supports ``on_bad_lines`` parameter with ``engine="pyarrow"``. (:issue:`54480`)
- :func:`read_spss` now returns a :class:`DataFrame` that stores the metadata in :attr:`DataFrame.attrs`. (:issue:`54264`)
- :func:`read_stata` now returns ``datetime64`` resolutions better matching those natively stored in the stata format (:issue:`55642`)
- :func:`tseries.api.guess_datetime_format` is now part of the public API (:issue:`54727`)
- :meth:`ExtensionArray._explode` interface method added to allow extension type implementations of the ``explode`` method (:issue:`54833`)
- :meth:`ExtensionArray.duplicated` added to allow extension type implementations of the ``duplicated`` method (:issue:`55255`)
- DataFrame.apply now allows the usage of numba (via ``engine="numba"``) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
Expand Down Expand Up @@ -289,21 +290,21 @@ Other Deprecations
- Deprecated the option ``mode.data_manager`` and the ``ArrayManager``; only the ``BlockManager`` will be available in future versions (:issue:`55043`)
- Deprecated the previous implementation of :class:`DataFrame.stack`; specify ``future_stack=True`` to adopt the future version (:issue:`53515`)
- Deprecating downcasting the results of :meth:`DataFrame.fillna`, :meth:`Series.fillna`, :meth:`DataFrame.ffill`, :meth:`Series.ffill`, :meth:`DataFrame.bfill`, :meth:`Series.bfill` in object-dtype cases. To opt in to the future version, use ``pd.set_option("future.no_silent_downcasting", True)`` (:issue:`54261`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_220.performance:

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement in :func:`concat` with ``axis=1`` and objects with unaligned indexes (:issue:`55084`)
- Performance improvement in :func:`merge_asof` when ``by`` contains more than one key (:issue:`55580`)
- Performance improvement in :func:`merge_asof` when ``by`` is not ``None`` (:issue:`55580`, :issue:`55678`)
- Performance improvement in :func:`read_stata` for files with many variables (:issue:`55515`)
- Performance improvement in :func:`to_dict` on converting DataFrame to dictionary (:issue:`50990`)
- Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
- Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
- Performance improvement in :meth:`Index.difference` (:issue:`55108`)
- Performance improvement in :meth:`Series.duplicated` for pyarrow dtypes (:issue:`55255`)
- Performance improvement in :meth:`SeriesGroupBy.idxmax`, :meth:`SeriesGroupBy.idxmin`, :meth:`DataFrameGroupBy.idxmax`, :meth:`DataFrameGroupBy.idxmin` (:issue:`54234`)
- Performance improvement when indexing with more than 4 keys (:issue:`54550`)
- Performance improvement when localizing time to UTC (:issue:`55241`)

Expand All @@ -323,8 +324,10 @@ Datetimelike
^^^^^^^^^^^^
- Bug in :func:`concat` raising ``AttributeError`` when concatenating all-NA DataFrame with :class:`DatetimeTZDtype` dtype DataFrame. (:issue:`52093`)
- Bug in :meth:`DatetimeIndex.union` returning object dtype for tz-aware indexes with the same timezone but different units (:issue:`55238`)
- Bug in :meth:`Index.view` to a datetime64 dtype with non-supported resolution incorrectly raising (:issue:`55710`)
- Bug in :meth:`Tick.delta` with very large ticks raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
- Bug in adding or subtracting a :class:`Week` offset to a ``datetime64`` :class:`Series`, :class:`Index`, or :class:`DataFrame` column with non-nanosecond resolution returning incorrect results (:issue:`55583`)
- Bug in addition or subtraction of :class:`BusinessDay` offset with ``offset`` attribute to non-nanosecond :class:`Index`, :class:`Series`, or :class:`DataFrame` column giving incorrect results (:issue:`55608`)
- Bug in addition or subtraction of :class:`DateOffset` objects with microsecond components to ``datetime64`` :class:`Index`, :class:`Series`, or :class:`DataFrame` columns with non-nanosecond resolution (:issue:`55595`)
- Bug in addition or subtraction of very large :class:`Tick` objects with :class:`Timestamp` or :class:`Timedelta` objects raising ``OverflowError`` instead of ``OutOfBoundsTimedelta`` (:issue:`55503`)
-
Expand Down Expand Up @@ -359,6 +362,7 @@ Strings
Interval
^^^^^^^^
- Bug in :class:`Interval` ``__repr__`` not displaying UTC offsets for :class:`Timestamp` bounds. Additionally the hour, minute and second components will now be shown. (:issue:`55015`)
- Bug in :meth:`IntervalIndex.from_arrays` when passed ``datetime64`` or ``timedelta64`` arrays with mismatched resolutions constructing an invalid ``IntervalArray`` object (:issue:`55714`)
- Bug in :meth:`IntervalIndex.get_indexer` with datetime or timedelta intervals incorrectly matching on integer targets (:issue:`47772`)
- Bug in :meth:`IntervalIndex.get_indexer` with timezone-aware datetime intervals incorrectly matching on a sequence of timezone-naive targets (:issue:`47772`)
- Bug in setting values on a :class:`Series` with an :class:`IntervalIndex` using a slice incorrectly raising (:issue:`54722`)
Expand Down Expand Up @@ -404,14 +408,16 @@ Plotting
Groupby/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in :class:`.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with ``closed='left'`` and ``closed='neither'`` (:issue:`20712`)
- Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, and :meth:`.SeriesGroupBy.idxmax` would not retain :class:`.Categorical` dtype when the index was a :class:`.CategoricalIndex` that contained NA values (:issue:`54234`)
- Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` when ``observed=False`` and ``f="idxmin"`` or ``f="idxmax"`` would incorrectly raise on unobserved categories (:issue:`54234`)
- Bug in :meth:`DataFrame.resample` not respecting ``closed`` and ``label`` arguments for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55282`)
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55281`)
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.MonthBegin` (:issue:`55271`)
-

Reshaping
^^^^^^^^^
- Bug in :func:`concat` ignoring ``sort`` parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`)
- Bug in :func:`merge_asof` raising ``TypeError`` when ``by`` dtype is not ``object``, ``int64``, or ``uint64`` (:issue:`22794`)
- Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
- Bug in :meth:`pandas.DataFrame.melt` where it would not preserve the datetime (:issue:`55254`)
-
Expand Down
12 changes: 12 additions & 0 deletions pandas/_libs/groupby.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,18 @@ def group_min(
mask: np.ndarray | None = ...,
result_mask: np.ndarray | None = ...,
) -> None: ...
def group_idxmin_idxmax(
out: npt.NDArray[np.intp],
counts: npt.NDArray[np.int64],
values: np.ndarray, # ndarray[groupby_t, ndim=2]
labels: npt.NDArray[np.intp],
min_count: int = ...,
is_datetimelike: bool = ...,
mask: np.ndarray | None = ...,
name: str = ...,
skipna: bool = ...,
result_mask: np.ndarray | None = ...,
) -> None: ...
def group_cummin(
out: np.ndarray, # groupby_t[:, ::1]
values: np.ndarray, # ndarray[groupby_t, ndim=2]
Expand Down
Loading

0 comments on commit 38fad02

Please sign in to comment.