Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into string-dtype-index-…
Browse files Browse the repository at this point in the history
…engine-non-strict-missing
  • Loading branch information
jorisvandenbossche committed Jan 2, 2025
2 parents a49f843 + 3bc44d4 commit 88cc28f
Show file tree
Hide file tree
Showing 81 changed files with 1,019 additions and 1,030 deletions.
1 change: 0 additions & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ jobs:
fi
python -m pip install --no-build-isolation -ve . -Csetup-args="--werror"
PATH=$HOME/miniconda3/envs/pandas-dev/bin:$HOME/miniconda3/condabin:$PATH
sudo apt-get update && sudo apt-get install -y libegl1 libopengl0
ci/run_tests.sh
test-linux-musl:
docker:
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -385,10 +385,12 @@ jobs:
nogil: true

- name: Build Environment
# TODO: Once numpy 2.2.1 is out, don't install nightly version
# Tests segfault with numpy 2.2.0: https://github.com/numpy/numpy/pull/27955
run: |
python --version
python -m pip install --upgrade pip setuptools wheel numpy meson[ninja]==1.2.1 meson-python==0.13.1
python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple cython
python -m pip install --upgrade pip setuptools wheel meson[ninja]==1.2.1 meson-python==0.13.1
python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple cython numpy
python -m pip install versioneer[toml]
python -m pip install python-dateutil pytz tzdata hypothesis>=6.84.0 pytest>=7.3.2 pytest-xdist>=3.4.0 pytest-cov
python -m pip install -ve . --no-build-isolation --no-index --no-deps -Csetup-args="--werror"
Expand Down
8 changes: 0 additions & 8 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,8 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.Timestamp.resolution PR02" \
-i "pandas.Timestamp.tzinfo GL08" \
-i "pandas.arrays.ArrowExtensionArray PR07,SA01" \
-i "pandas.arrays.IntervalArray.length SA01" \
-i "pandas.arrays.NumpyExtensionArray SA01" \
-i "pandas.arrays.TimedeltaArray PR07,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.boxplot PR07,RT03,SA01" \
-i "pandas.core.groupby.DataFrameGroupBy.plot PR02" \
-i "pandas.core.groupby.SeriesGroupBy.plot PR02" \
-i "pandas.core.resample.Resampler.max PR01,RT03,SA01" \
Expand All @@ -95,14 +93,8 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
-i "pandas.core.resample.Resampler.std SA01" \
-i "pandas.core.resample.Resampler.transform PR01,RT03,SA01" \
-i "pandas.core.resample.Resampler.var SA01" \
-i "pandas.errors.NullFrequencyError SA01" \
-i "pandas.errors.NumbaUtilError SA01" \
-i "pandas.errors.PerformanceWarning SA01" \
-i "pandas.errors.UndefinedVariableError PR01,SA01" \
-i "pandas.errors.ValueLabelTypeMismatch SA01" \
-i "pandas.io.json.build_table_schema PR07,RT03,SA01" \
-i "pandas.plotting.andrews_curves RT03,SA01" \
-i "pandas.plotting.scatter_matrix PR07,SA01" \
-i "pandas.tseries.offsets.BDay PR02,SA01" \
-i "pandas.tseries.offsets.BQuarterBegin.is_on_offset GL08" \
-i "pandas.tseries.offsets.BQuarterBegin.n GL08" \
Expand Down
2 changes: 1 addition & 1 deletion doc/source/development/maintaining.rst
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@ Post-Release
for reference):

- The pandas-dev and pydata mailing lists
- Twitter, Mastodon, Telegram and LinkedIn
- X, Mastodon, Telegram and LinkedIn

7. Update this release instructions to fix anything incorrect and to update about any
change since the last release.
Expand Down
2 changes: 0 additions & 2 deletions doc/source/reference/frame.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,6 @@ Reindexing / selection / label manipulation
DataFrame.duplicated
DataFrame.equals
DataFrame.filter
DataFrame.head
DataFrame.idxmax
DataFrame.idxmin
DataFrame.reindex
Expand All @@ -196,7 +195,6 @@ Reindexing / selection / label manipulation
DataFrame.sample
DataFrame.set_axis
DataFrame.set_index
DataFrame.tail
DataFrame.take
DataFrame.truncate

Expand Down
4 changes: 2 additions & 2 deletions doc/source/user_guide/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
df
# List the size of the animals with the highest weight.
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()], include_groups=False)
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
`Using get_group
<https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key>`__
Expand All @@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])
expected_df = gb.apply(GrowUp, include_groups=False)
expected_df = gb.apply(GrowUp)
expected_df
`Expanding apply
Expand Down
8 changes: 4 additions & 4 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1074,7 +1074,7 @@ missing values with the ``ffill()`` method.
).set_index("date")
df_re
df_re.groupby("group").resample("1D", include_groups=False).ffill()
df_re.groupby("group").resample("1D").ffill()
.. _groupby.filter:

Expand Down Expand Up @@ -1252,13 +1252,13 @@ the argument ``group_keys`` which defaults to ``True``. Compare

.. ipython:: python
df.groupby("A", group_keys=True).apply(lambda x: x, include_groups=False)
df.groupby("A", group_keys=True).apply(lambda x: x)
with

.. ipython:: python
df.groupby("A", group_keys=False).apply(lambda x: x, include_groups=False)
df.groupby("A", group_keys=False).apply(lambda x: x)
Numba accelerated routines
Expand Down Expand Up @@ -1742,7 +1742,7 @@ column index name will be used as the name of the inserted column:
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
return pd.Series(result, name="metrics")
result = df.groupby("a").apply(compute_metrics, include_groups=False)
result = df.groupby("a").apply(compute_metrics)
result
Expand Down
5 changes: 0 additions & 5 deletions doc/source/user_guide/visualization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1210,11 +1210,6 @@ You may set the ``xlabel`` and ``ylabel`` arguments to give the plot custom labe
for x and y axis. By default, pandas will pick up index name as xlabel, while leaving
it empty for ylabel.

.. ipython:: python
:suppress:
plt.figure();
.. ipython:: python
df.plot();
Expand Down
5 changes: 5 additions & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Other enhancements
- :meth:`DataFrame.plot.scatter` argument ``c`` now accepts a column of strings, where rows with the same string are colored identically (:issue:`16827` and :issue:`16485`)
- :func:`read_parquet` accepts ``to_pandas_kwargs`` which are forwarded to :meth:`pyarrow.Table.to_pandas` which enables passing additional keywords to customize the conversion to pandas, such as ``maps_as_pydicts`` to read the Parquet map data type as python dictionaries (:issue:`56842`)
- :meth:`DataFrameGroupBy.transform`, :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, :meth:`RollingGroupby.apply`, :meth:`ExpandingGroupby.apply`, :meth:`Rolling.apply`, :meth:`Expanding.apply`, :meth:`DataFrame.apply` with ``engine="numba"`` now supports positional arguments passed as kwargs (:issue:`58995`)
- :meth:`Rolling.agg`, :meth:`Expanding.agg` and :meth:`ExponentialMovingWindow.agg` now accept :class:`NamedAgg` aggregations through ``**kwargs`` (:issue:`28333`)
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
- :meth:`str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
Expand Down Expand Up @@ -553,6 +554,7 @@ Other Removals
- Removed the ``method`` keyword in ``ExtensionArray.fillna``, implement ``ExtensionArray._pad_or_backfill`` instead (:issue:`53621`)
- Removed the attribute ``dtypes`` from :class:`.DataFrameGroupBy` (:issue:`51997`)
- Enforced deprecation of ``argmin``, ``argmax``, ``idxmin``, and ``idxmax`` returning a result when ``skipna=False`` and an NA value is encountered or all values are NA values; these operations will now raise in such cases (:issue:`33941`, :issue:`51276`)
- Removed specifying ``include_groups=True`` in :class:`.DataFrameGroupBy.apply` and :class:`.Resampler.apply` (:issue:`7155`)

.. ---------------------------------------------------------------------------
.. _whatsnew_300.performance:
Expand Down Expand Up @@ -626,6 +628,7 @@ Datetimelike
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
- Bug in :meth:`Series.dt.microsecond` producing incorrect results for pyarrow backed :class:`Series`. (:issue:`59154`)
- Bug in :meth:`to_datetime` not respecting dayfirst if an uncommon date string was passed. (:issue:`58859`)
- Bug in :meth:`to_datetime` on float32 df with year, month, day etc. columns leads to precision issues and incorrect result. (:issue:`60506`)
- Bug in :meth:`to_datetime` reports incorrect index in case of any failure scenario. (:issue:`58298`)
- Bug in :meth:`to_datetime` wrongly converts when ``arg`` is a ``np.datetime64`` object with unit of ``ps``. (:issue:`60341`)
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)
Expand Down Expand Up @@ -798,7 +801,9 @@ Other
- Bug in :meth:`read_csv` where chained fsspec TAR file and ``compression="infer"`` fails with ``tarfile.ReadError`` (:issue:`60028`)
- Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers' associated dtype, for string and datetime columns (:issue:`54781`)
- Bug in ``Series.list`` methods not preserving the original :class:`Index`. (:issue:`58425`)
- Bug in ``Series.list`` methods not preserving the original name. (:issue:`60522`)
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)

.. ***DO NOT USE THIS SECTION***
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,7 @@ class NaTType(_NaT):
difference between the current timezone and UTC.
Returns
--------
-------
timedelta
The difference between UTC and the local time as a `timedelta` object.
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2217,7 +2217,7 @@ class Timestamp(_Timestamp):
difference between the current timezone and UTC.

Returns
--------
-------
timedelta
The difference between UTC and the local time as a `timedelta` object.

Expand Down
24 changes: 20 additions & 4 deletions pandas/core/arrays/arrow/accessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,10 @@ def len(self) -> Series:

value_lengths = pc.list_value_length(self._pa_array)
return Series(
value_lengths, dtype=ArrowDtype(value_lengths.type), index=self._data.index
value_lengths,
dtype=ArrowDtype(value_lengths.type),
index=self._data.index,
name=self._data.name,
)

def __getitem__(self, key: int | slice) -> Series:
Expand Down Expand Up @@ -162,7 +165,10 @@ def __getitem__(self, key: int | slice) -> Series:
# key = pc.add(key, pc.list_value_length(self._pa_array))
element = pc.list_element(self._pa_array, key)
return Series(
element, dtype=ArrowDtype(element.type), index=self._data.index
element,
dtype=ArrowDtype(element.type),
index=self._data.index,
name=self._data.name,
)
elif isinstance(key, slice):
if pa_version_under11p0:
Expand All @@ -181,7 +187,12 @@ def __getitem__(self, key: int | slice) -> Series:
if step is None:
step = 1
sliced = pc.list_slice(self._pa_array, start, stop, step)
return Series(sliced, dtype=ArrowDtype(sliced.type), index=self._data.index)
return Series(
sliced,
dtype=ArrowDtype(sliced.type),
index=self._data.index,
name=self._data.name,
)
else:
raise ValueError(f"key must be an int or slice, got {type(key).__name__}")

Expand Down Expand Up @@ -223,7 +234,12 @@ def flatten(self) -> Series:
counts = pa.compute.list_value_length(self._pa_array)
flattened = pa.compute.list_flatten(self._pa_array)
index = self._data.index.repeat(counts.fill_null(pa.scalar(0, counts.type)))
return Series(flattened, dtype=ArrowDtype(flattened.type), index=index)
return Series(
flattened,
dtype=ArrowDtype(flattened.type),
index=index,
name=self._data.name,
)


class StructAccessor(ArrowAccessor):
Expand Down
14 changes: 14 additions & 0 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -1306,6 +1306,20 @@ def length(self) -> Index:
"""
Return an Index with entries denoting the length of each Interval.
The length of an interval is calculated as the difference between
its `right` and `left` bounds. This property is particularly useful
when working with intervals where the size of the interval is an important
attribute, such as in time-series analysis or spatial data analysis.
See Also
--------
arrays.IntervalArray.left : Return the left endpoints of each Interval in
the IntervalArray as an Index.
arrays.IntervalArray.right : Return the right endpoints of each Interval in
the IntervalArray as an Index.
arrays.IntervalArray.mid : Return the midpoint of each Interval in the
IntervalArray as an Index.
Examples
--------
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/computation/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,8 @@ def eval(
.. warning::
``eval`` can run arbitrary code which can make you vulnerable to code
injection and untrusted data.
This function can run arbitrary code which can make you vulnerable to code
injection if you pass user input to this function.
Parameters
----------
Expand Down
Loading

0 comments on commit 88cc28f

Please sign in to comment.