Move `year` to `.dt.year`, and other namespace-specific function #341

MarcoGorelli · 2023-12-20T11:13:49Z

We'd originally decided not to bother with temporal (and other) namespaces, and just have .year be a column method, rather than .dt.year

I'm suggesting to go back on this decision, because of the risk of conflicts - I'll explain

pandas is adding nested datatypes (e.g. pyarrow struct pandas-dev/pandas#54977 , and pyarrow list pandas-dev/pandas#55777)

Some methods on those datatypes will clash with the column/dataframe ones. For example:

Column.get_value(2) means "get the value from the second row"
Column.list.get_value(2) means "for each row, get the second element in that row's list"

Likewise for Column.mean vs Column.list.mean - the former being a reduction, the latter a transformation (preserves input shape)

Presumably, we'll eventually have nested datatypes in the standard too?

So, I'd suggest we mirror existing dataframe libraries and have namespaces for functionality which is limited to certain datatypes:

.dt for temporal functions
.str for string manipulation functions
later, if/when we have nested datatypes, .list / .struct, and .cat for categorical

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2023-12-21T16:46:52Z

I think there are also argument to consider nested data types separately, and to have those kind of accessors there while not for other data types.

In contrast to primitive / non-nested data types, those nested data types will indeed have a whole category of different methods that specifically deal with their nested nature (access sub-fields, apply a scalar function on each element in the list, apply an aggregation across the list). That might indeed warrant an accessor, because I think the alternative would be to just prefix all those methods with for example list_ (list_mean, list_get, ..), and at that point an accessor indeed makes sense.

But personally I would say the same reasoning doesn't really apply to other data types (e.g. there haven't yet been any methods where we would consider prefixing with datetime_..?)

MarcoGorelli added the API design label Dec 20, 2023

MarcoGorelli closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move `year` to `.dt.year`, and other namespace-specific function #341

Move `year` to `.dt.year`, and other namespace-specific function #341

MarcoGorelli commented Dec 20, 2023

jorisvandenbossche commented Dec 21, 2023

Move year to .dt.year, and other namespace-specific function #341

Move year to .dt.year, and other namespace-specific function #341

Comments

MarcoGorelli commented Dec 20, 2023

jorisvandenbossche commented Dec 21, 2023

Move `year` to `.dt.year`, and other namespace-specific function #341

Move `year` to `.dt.year`, and other namespace-specific function #341