-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: behaviour for the "str.center()" string method for the pyarrow-backed string dtype #54807
Comments
xref apache/arrow#15053 |
I added a (at least for the new NaN variant of the String dtype under PDEP 14) |
i just came across this in trying to see if ArrowStringArray._str_pad could re-use ArrowEA._str_pad. The |
I am fine with doing that across the board, but up to @mroeschke to decide for ArrowExtensionArray |
I'm not super keen on having ArrowEA falling back in general, but if this would just be temporary until pyarrow 17 is our minimum version (where it sounds like we no longer need the fallback) then I would be OK with the logic sharing for code-deduplication. |
I guess this makes sense to do as a breaking change in 3.0
…On Mon, Aug 26, 2024 at 11:17 AM Matthew Roeschke ***@***.***> wrote:
I'm not super keen on having ArrowEA falling back in general, but if this
would just be temporary until pyarrow 17 is our minimum version (where it
sounds like we no longer need the fallback) then I would be OK with the
logic sharing for code-deduplication.
—
Reply to this email directly, view it on GitHub
<#54807 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5UM6FGACDS4FYINHOCDJTZTNWJZAVCNFSM6AAAAABLULQKYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQG44TENJUGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
From #54533 (comment) (and relevant for new String dtype #54792)
In general the pyarrow backed string dtype is meant to behave the same as the existing object-dtype or non-pyarrow backed StringDtype, so users in generally should have a smooth way of upgrading. In general most string compute functions in pyarrow have similar behaviour, or we still have a few places where we fall back to the original object-based methods.
One specific method that currently has a different behaviour is
str.center
:The first three dtypes all give the same result, but for the last one using
StringDtype(storage="pyarrow_numpy")
, the strings are "aligned" to the left instead of right in case an uneven fill chars have to be used.This is a behaviour we inherited from the Python
str
center method, but the pyarrowpyarrow.compute.ascii_center
function apparently made the opposite choice (I don't think was necessarily an intentional choice to deviate from this; there was no discussion about this on the PR (apache/arrow#10586), which did add the comment "If odd number of spaces, put the extra space on the left" to the relevant place in the code that handles this).This is a potentially breaking change for users, so to start with should decide if we want to keep the current behaviour or not for the future string dtype. On the short term, we can fall back to the python string objects method (like we do for the other string dtypes). But we can also ask pyarrow to add a keyword to control this aspect of the compute function (that should be an easy change), so that in the future we can again use the faster pyarrow function.
The text was updated successfully, but these errors were encountered: