Skip to content

Commit

Permalink
Backport PR pandas-dev#57474 on branch 2.2.x (REGR: DataFrame.transpo…
Browse files Browse the repository at this point in the history
…se resulting in not contiguous data on nullable EAs) (pandas-dev#57496)

Backport PR pandas-dev#57474: REGR: DataFrame.transpose resulting in not contiguous data on nullable EAs
  • Loading branch information
rhshadrach authored Feb 19, 2024
1 parent b79fe7e commit dfc66f6
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 3 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.to_dict` with ``orient='list'`` and datetime or timedelta types returning integers (:issue:`54824`)
- Fixed regression in :meth:`DataFrame.to_json` converting nullable integers to floats (:issue:`57224`)
- Fixed regression in :meth:`DataFrame.to_sql` when ``method="multi"`` is passed and the dialect type is not Oracle (:issue:`57310`)
- Fixed regression in :meth:`DataFrame.transpose` with nullable extension dtypes not having F-contiguous data potentially causing exceptions when used (:issue:`57315`)
- Fixed regression in :meth:`DataFrame.update` emitting incorrect warnings about downcasting (:issue:`57124`)
- Fixed regression in :meth:`DataFrameGroupBy.idxmin`, :meth:`DataFrameGroupBy.idxmax`, :meth:`SeriesGroupBy.idxmin`, :meth:`SeriesGroupBy.idxmax` ignoring the ``skipna`` argument (:issue:`57040`)
- Fixed regression in :meth:`DataFrameGroupBy.idxmin`, :meth:`DataFrameGroupBy.idxmax`, :meth:`SeriesGroupBy.idxmin`, :meth:`SeriesGroupBy.idxmax` where values containing the minimum or maximum value for the dtype could produce incorrect results (:issue:`57040`)
Expand Down
17 changes: 14 additions & 3 deletions pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -1621,13 +1621,24 @@ def transpose_homogeneous_masked_arrays(
same dtype. The caller is responsible for ensuring validity of input data.
"""
masked_arrays = list(masked_arrays)
dtype = masked_arrays[0].dtype

values = [arr._data.reshape(1, -1) for arr in masked_arrays]
transposed_values = np.concatenate(values, axis=0)
transposed_values = np.concatenate(
values,
axis=0,
out=np.empty(
(len(masked_arrays), len(masked_arrays[0])),
order="F",
dtype=dtype.numpy_dtype,
),
)

masks = [arr._mask.reshape(1, -1) for arr in masked_arrays]
transposed_masks = np.concatenate(masks, axis=0)
transposed_masks = np.concatenate(
masks, axis=0, out=np.empty_like(transposed_values, dtype=bool)
)

dtype = masked_arrays[0].dtype
arr_type = dtype.construct_array_type()
transposed_arrays: list[BaseMaskedArray] = []
for i in range(transposed_values.shape[1]):
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/frame/methods/test_transpose.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import pandas.util._test_decorators as td

import pandas as pd
from pandas import (
DataFrame,
DatetimeIndex,
Expand Down Expand Up @@ -190,3 +191,19 @@ def test_transpose_not_inferring_dt_mixed_blocks(self):
dtype=object,
)
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize("dtype1", ["Int64", "Float64"])
@pytest.mark.parametrize("dtype2", ["Int64", "Float64"])
def test_transpose(self, dtype1, dtype2):
# GH#57315 - transpose should have F contiguous blocks
df = DataFrame(
{
"a": pd.array([1, 1, 2], dtype=dtype1),
"b": pd.array([3, 4, 5], dtype=dtype2),
}
)
result = df.T
for blk in result._mgr.blocks:
# When dtypes are unequal, we get NumPy object array
data = blk.values._data if dtype1 == dtype2 else blk.values
assert data.flags["F_CONTIGUOUS"]

0 comments on commit dfc66f6

Please sign in to comment.