Replies: 2 comments 4 replies
-
Yeah, When we've talked about implementing Categorical is also its own thing, applying only to The special behavior we're pulling out of generic I see that you're trying to pull together similar things to give them a common implementation, but
Why can't they be three special cases? They all apply to different sets of nodes:
They seem to be three different things to me. |
Beta Was this translation helpful? Give feedback.
-
I've used this (as a user), but I am in agreement that this is unlikely.
I think they mostly should be — at least, that's what I propose here.
I'd like for users to be able to add their own methods to these things via behaviors. That means either
In particular, I think units, strings, and categoricals have group-properties that should transcend their nominal type. That's what a custom string should have; now that strings are built in, users should have a way to add their own named strings without needing to reimplement everything. |
Beta Was this translation helpful? Give feedback.
-
One of the challenges here is the coupling between behavior classes (via
__array__
and__record__
parameters) and array types (__array__ = "string"
, ...). The intention that string classes be user-customisable without requiring a significant rework of how far behaviors propagate into the codebase is one motivation, but another is generally figuring out how we think about the__array__
parameters vs other type parameters.I think it would be helpful to start with describing some use cases:
To compare "types" between two arrays, we have the following:
NumpyArray
— do both arrays have the same primitive?RecordArray
— do both arrays have the same__record__
and are their structures type-comparable?ListOffsetArray
, ..., — do both arrays have the same__array__
, and are their contents type-comparable?It's clear to me that
__array__
and__record__
are quite different, beyond their association with different content classes. Unlike__record__
,__array__
is used to implement non-behavior customisation that precludes the use of custom behaviors.I propose we introduce a new
__kind__
parameter that can currently be one of("string", "categorical")
. We can introduce constraints upon which contents can be assigned with which values of__kind__
. Introducing this parameter would allow us to separate user-provided names from built-in names, permitting custom strings and categoricals, e.g.The nominal type precedence is then:
__array__
__kind__
and is used to resolve behaviors. Meanwhile, string-specific features check
__kind__
, or fall back upon__array__
for legacy strings.In this new formulation, we have the following interpretation of each parameter:
__array__
— the nominal type of a list (OR, the kind of a "legacy" string or categorical)__record__
— the nominal type of a record__kind__
— the built-in nominal type of a list (e.g.string
orcategorical
)Built-in types (strings, categoricals) look at
__kind__
, but high-level features like behaviors look at the resolved nominal type, i.e. via the precedence above.Units (#2468) are a related problem; we can either implement them as a user-behavior, or as a low-level integration.
If units were implemented using behaviors, e.g.
then it would not be possible to define a custom class for this array without re-overloading all of the ufunc methods that implement the units system. I can't immediately see what that would be useful for, but it doesn't feel that the behavior class is strongly related to the units system. Just as in this new system, string features check
__kind__
, I think units conversion should happen in the ufunc dispatch, and look exclusively at__unit__
, i.e.__array__
→__kind__
)?Ultimately, all of this represents a different mindset; behaviours are for users, and we should only use behaviors internally iff. the new feature(s) aren't intended to compose with other features. I feel that strings are orthogonal to custom user-types, and that units are also orthogonal to custom user-types.
Tagging @jpivarski for visibility
Beta Was this translation helpful? Give feedback.
All reactions