Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiplication of pd.ArrowDtype(pa.string()) and integral value where integral value is a series #56538

Merged
merged 7 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 18 additions & 10 deletions pandas/core/arrays/arrow/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -695,22 +695,23 @@ def _evaluate_op_method(self, other, op, arrow_funcs):
other = self._box_pa(other)

if pa.types.is_string(pa_type) or pa.types.is_binary(pa_type):
if op in [operator.add, roperator.radd, operator.mul, roperator.rmul]:
if op in [operator.add, roperator.radd]:
sep = pa.scalar("", type=pa_type)
if op is operator.add:
result = pc.binary_join_element_wise(self._pa_array, other, sep)
elif op is roperator.radd:
result = pc.binary_join_element_wise(other, self._pa_array, sep)
else:
if not (
isinstance(other, pa.Scalar) and pa.types.is_integer(other.type)
):
raise TypeError("Can only string multiply by an integer.")
result = pc.binary_join_element_wise(
*([self._pa_array] * other.as_py()), sep
)
return type(self)(result)

elif op in [operator.mul, roperator.rmul]:
result = type(self)._evaluate_binary_repeat(self._pa_array, other)
return type(self)(result)
elif (
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
pa.types.is_integer(pa_type)
and (pa.types.is_string(other.type) or pa.types.is_binary(other.type))
and op in [operator.mul, roperator.rmul]
):
result = type(self)._evaluate_binary_repeat(other, self._pa_array)
return type(self)(result)
if (
isinstance(other, pa.Scalar)
and pc.is_null(other).as_py()
Expand All @@ -726,6 +727,13 @@ def _evaluate_op_method(self, other, op, arrow_funcs):
result = pc_func(self._pa_array, other)
return type(self)(result)

@staticmethod
def _evaluate_binary_repeat(binary, integral):
if not pa.types.is_integer(integral.type):
rohanjain101 marked this conversation as resolved.
Show resolved Hide resolved
raise TypeError("Can only string multiply by an integer.")
pa_integral = pc.if_else(pc.less(integral, 0), 0, integral)
return pc.binary_repeat(binary, pa_integral)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you inline these where needed. I don't think a whole new function is needed for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke, can do, but there's 2 callers of it, so if we remove the function, the logic for calculating the repeat count to pass to binary_repeat would need to be repeated in both callers. Additionally, I believe the helper function simplifies the code because there's 2 scenarios:

  1. binary is lhs and int is rhs
  2. binary is rhs and int is lhs

When we call binary_repeat, we need to always ensure the binary is the left arg and the integral is the right arg, imo the helper function simplifies the code and makes it a bit more readable. Let me know if you would still prefer logic of _evaluate_binary_repeat to be inlined, or if I misunderstood your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer to inline this for now. We don't really use staticmethods in the codebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, inlined.


def _logical_method(self, other, op):
# For integer types `^`, `|`, `&` are bitwise operators and return
# integer types. Otherwise these are boolean ops.
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/extension/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -1330,6 +1330,26 @@ def test_arrowdtype_construct_from_string_type_only_one_pyarrow():
pd.Series(range(3), dtype=invalid)


def test_arrow_string_multiplication():
# GH 56537
binary = pd.Series(["abc", "defg"], dtype=ArrowDtype(pa.string()))
repeat = pd.Series([2, -2], dtype="int64[pyarrow]")
result = binary * repeat
expected = pd.Series(["abcabc", ""], dtype=ArrowDtype(pa.string()))
tm.assert_series_equal(result, expected)
reflected_result = repeat * binary
tm.assert_series_equal(result, reflected_result)


def test_arrow_string_multiplication_scalar_repeat():
binary = pd.Series(["abc", "defg"], dtype=ArrowDtype(pa.string()))
result = binary * 2
expected = pd.Series(["abcabc", "defgdefg"], dtype=ArrowDtype(pa.string()))
tm.assert_series_equal(result, expected)
reflected_result = 2 * binary
tm.assert_series_equal(reflected_result, expected)


@pytest.mark.parametrize(
"interpolation", ["linear", "lower", "higher", "nearest", "midpoint"]
)
Expand Down