-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: EA._from_scalars #53089
ENH: EA._from_scalars #53089
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decide whether dtype should be required.
The reason to not do this would be that when starting with an int64[pyarrow], you could end up with float64[pyarrow]?
However, given that this is essentially a full type system (all possible types are available in ArrowDtype), that would essentially make this non-strict for those?
(so I would maybe start with requiring a dtype
)
There are also some corner cases listed in my original PR: #38315 (comment) (need to read through that again myself as well)
pandas/core/dtypes/cast.py
Outdated
if hasattr(cls, "_from_scalars"): | ||
# TODO: get this everywhere! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably add a simple _from_scalars
in the base class that just calls _from_sequence
. We will need some fallback like that for external EAs anyway (unless we keep this hasattr check, but since we fallback to _from_sequence
below anyway, can better do this in _from_scalars
I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this makes sense.
ATM _from_sequence can raise anything, so we catch anything, which isn't an ideal pattern. Thoughts on documenting _from_scalars as only raising ValueError/TypeError and eventually enforcing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on documenting _from_scalars as only raising ValueError/TypeError and eventually enforcing that?
Yes, I think that's certainly fine
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@jbrockmendel , I have done some investigation based on your PR. I have found one issue: |
@jorisvandenbossche i think the last round of comments and OP issues have been addressed. any other thoughts? |
@@ -280,6 +283,38 @@ def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy: bool = Fal | |||
""" | |||
raise AbstractMethodError(cls) | |||
|
|||
@classmethod | |||
def _from_scalars(cls, scalars, *, dtype: DtypeObj) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking aloud. Once we allow EA authors to specify their "is_scalar" passing equivalent, should that be incorporated here somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Assuming at some point we add to the EADtype something like a _is_recognized_scalar
method, the default implementation of that method might look like (assuming away NAs for the moment)
def _is_recognized_scalar(self, scalar) -> bool:
cls = self.construct_array_class()
try:
cls._from_scalars([scalar], dtype=self)
except TypeError:
return False
return True
We could alternately implement a default from_scalars in terms of _is_recognized_scalar. _find_compatible_dtype from #53106 could also be the primitive from which the other two are constructed (though i dont think it can be constructed from either).
If there are no further comments, I'm planning to merge this at the end of the week. |
xref #33254 _from_sequence is generally not very strict. This implements _from_scalars for a few EAs as a POC to get rid of an ugly kluge in
maybe_cast_pointwise_result
. The changed behavior in test_resample_categorical_data_with_timedeltaindex looks like a clear improvement to me.cc @jorisvandenbossche
Things to work out before this leaves the "WIP" zone: