Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: ExtensionArray._from_sequence_of_strings assumes true_values and false_values in signature when ExtensionDtype._is_boolean=True #47534

Open
mroeschke opened this issue Jun 28, 2022 · 2 comments
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@mroeschke
Copy link
Member

mroeschke commented Jun 28, 2022

ExtensionArray._from_sequence_of_strings has a signature

def _from_sequence_of_strings(
        cls, strings, *, dtype: Dtype | None = None, copy=False
    ):

BooleanArray._from_sequence_of_strings has a signature

def _from_sequence_of_strings(
        cls,
        strings: list[str],
        *,
        dtype: Dtype | None = None,
        copy: bool = False,
        true_values: list[str] | None = None,
        false_values: list[str] | None = None,
    )

This is causing issues toward when trying to develop an ArrowExtensionArray with ArrowDtype that can support multiple underlying dtypes: #46008

elif is_extension_array_dtype(cast_type):
# ensure cast_type is an actual dtype and not a string
cast_type = pandas_dtype(cast_type)
array_type = cast_type.construct_array_type()
try:
if is_bool_dtype(cast_type):
return array_type._from_sequence_of_strings(
values,
dtype=cast_type,
true_values=self.true_values,
false_values=self.false_values,
)
else:
return array_type._from_sequence_of_strings(values, dtype=cast_type)

Possible paths forward:

  1. Document this behavior (But this makes parameterized ExtensionDtypes difficult to implement)
  2. Generalize & add a new keyword to _from_sequence_of_strings to provide a string conversion e.g.
def _from_sequence_of_strings(
        cls,
        strings: list[str],
        *,
        dtype: Dtype | None = None,
        copy: bool = False,
        string_map: dict[str, Any] | None = None OR conversion_func: Callable[str, Any] | None = None,
    )
@mroeschke mroeschke added API Design ExtensionArray Extending pandas with custom dtypes or arrays. labels Jun 28, 2022
@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Oct 17, 2022
@jbrockmendel
Copy link
Member

i like the string_map idea, though id prefer to avoid the callable option if that is viable

@jbrockmendel
Copy link
Member

I guess the downside of string_map would be that authors/readers might incorrectly think that anything other than true_values/false_values are actually used. Might be better to just add true_values/false_values to the base class signature and document how this is only used deep inside read_csv (and remove any other extraneous uses)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

2 participants