ENH: Series.struct accessor with Series.struct.field("sub-column name") for ArrowDtype #54938
Closed
1 of 3 tasks
Labels
Accessors
accessor registration mechanism (not .str, .dt, .cat)
Arrow
pyarrow functionality
Enhancement
Milestone
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
When I have a
Series
of typeArrowDtype(struct(...))
, I'd like to be able to extract sub-fields from them.For example, I have a pandas
Series
with theArrowDtype(pyarrow.struct([("int_col", pyarrow.int64()), ("string_col", pyarrow.string())]))
. I'd like to extract just theint_col
field from thisSeries
as anotherSeries
.Feature Description
Add a
struct
accessor which is accessible fromSeries
withArrowDtype(struct(...))
. Thisstruct
accessor provides afield()
method which returns aSeries
containing only the specified sub-field.Alternative Solutions
I can currently do this via pyarrow.compute.struct_field on the underlying pyarrow array:
Additional Context
This issue is particularly relevant when working with data sources that support struct fields, such as BigQuery or Parquet.
The text was updated successfully, but these errors were encountered: