Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index.getitem returning wrong result with negative step for arrow #55832

Merged
merged 13 commits into from
Nov 22, 2023
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Fixed regressions
Bug fixes
~~~~~~~~~
- Bug in :meth:`DatetimeIndex.diff` raising ``TypeError`` (:issue:`55080`)
- Bug in :meth:`Index.__getitem__` returning wrong result for Arrow dtypes and negative stepsize (:issue:`55832`)
- Bug in :meth:`Index.isin` raising for Arrow backed string and ``None`` value (:issue:`55821`)
- Fix :func:`read_parquet` and :func:`read_feather` for `CVE-2023-47248 <https://www.cve.org/CVERecord?id=CVE-2023-47248>`__ (:issue:`55894`)

Expand Down
11 changes: 11 additions & 0 deletions pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,17 @@ def __getitem__(self, item: PositionalIndexer):
)
# We are not an array indexer, so maybe e.g. a slice or integer
# indexer. We dispatch to pyarrow.
if isinstance(item, slice):
if item.start == item.stop:
pass
elif (
item.stop is not None
and item.stop < -len(self)
and item.step is not None
and item.step < 0
):
Comment on lines +561 to +565
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about why we do this (or refer to a pyarrow bug report or this PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the arrow ref

item = slice(item.start, None, item.step)

value = self._pa_array[item]
if isinstance(value, pa.ChunkedArray):
return type(self)(value)
Expand Down
25 changes: 22 additions & 3 deletions pandas/tests/indexes/object/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import pytest

from pandas._libs.missing import is_matching_na
import pandas.util._test_decorators as td

import pandas as pd
from pandas import Index
Expand Down Expand Up @@ -144,6 +145,13 @@ def test_get_indexer_non_unique_np_nats(self, np_nat_fixture, np_nat_fixture2):


class TestSliceLocs:
@pytest.mark.parametrize(
"dtype",
[
"object",
pytest.param("string[pyarrow_numpy]", marks=td.skip_if_no("pyarrow")),
],
)
@pytest.mark.parametrize(
"in_slice,expected",
[
Expand All @@ -167,12 +175,23 @@ class TestSliceLocs:
(pd.IndexSlice["m":"m":-1], ""), # type: ignore[misc]
],
)
def test_slice_locs_negative_step(self, in_slice, expected):
index = Index(list("bcdxy"))
def test_slice_locs_negative_step(self, in_slice, expected, dtype):
index = Index(list("bcdxy"), dtype=dtype)

s_start, s_stop = index.slice_locs(in_slice.start, in_slice.stop, in_slice.step)
result = index[s_start : s_stop : in_slice.step]
expected = Index(list(expected))
expected = Index(list(expected), dtype=dtype)
tm.assert_index_equal(result, expected)

@td.skip_if_no("pyarrow")
def test_slice_locs_negative_step_oob(self):
index = Index(list("bcdxy"), dtype="string[pyarrow_numpy]")

result = index[-10:5:1]
tm.assert_index_equal(result, index)

result = index[4:-10:-1]
expected = Index(list("yxdcb"), dtype="string[pyarrow_numpy]")
tm.assert_index_equal(result, expected)

def test_slice_locs_dup(self):
Expand Down
Loading