Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DataFramegroupby does not support named aggregation #27278

Open
1 of 15 tasks
robmoore opened this issue Jun 27, 2023 · 12 comments
Open
1 of 15 tasks

[Bug]: DataFramegroupby does not support named aggregation #27278

robmoore opened this issue Jun 27, 2023 · 12 comments

Comments

@robmoore
Copy link

What happened?

Attempts to use a named aggregation in a groupby result in a TypeError (TypeError: DeferredGroupBy.agg() missing 1 required positional argument: 'fn').

Example case:

# Same error occurs when using explicit pd.NamedAggs instead of tuples
df.groupby(['quarter', 'program']).agg(total_spend=('revenue', 'sum'), avg_spend=('revenue', 'mean'))

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor

tvalentyn commented Aug 3, 2023

Thanks for reporting, it should be possible to support this - would you be interested in taking a closer look and contributing a PR?

@SiddharthJadhav99
Copy link

@tvalentyn can you assign this issue to me? I'll be able to support this.

@tvalentyn
Copy link
Contributor

hi @SiddharthJadhav99 just checking if you have any questions or need help.

@SiddharthJadhav99
Copy link

SiddharthJadhav99 commented Sep 6, 2023

hey @tvalentyn & @robmoore, It would be really helpful if you could send a sample code which would replicate the error or if you could elaborate a little regarding this bug.

@robmoore
Copy link
Author

robmoore commented Sep 6, 2023

@SiddharthJadhav99 Please see example in the pd.NamedAgg examples for Beam issue 27278 Colab notebook. The error is replicated in the section entitled "Example using Beam Interactive".

@SiddharthJadhav99
Copy link

hey @tvalentyn, i tried to solve this issue but I am unable to do so. You may unassign me from this issue. thanks @robmoore for your cooperation and help!

@SiddharthJadhav99 SiddharthJadhav99 removed their assignment Oct 2, 2023
@artemyushko
Copy link

.take-issue

@vineetg3
Copy link

Hi @artemyushko , are you working this issue as of today?

@tvalentyn
Copy link
Contributor

Given that we haven't heard from @artemyushko for a while I'll go ahead and unassign the issue. @artemyushko please don't hesitate to take it again if/when you plan to continue working on this.

@artemyushko
Copy link

Hi @tvalentyn , I looked into this issue a while ago, and it turns out that DataFrameGroupBy.groupby does not really support tuples the same way NamedAgg in pandas does, which I haven't figured a solution to. I have been trying to make DataFrameGroupBy represent the SQL call of f(column) as my_column_name, but I had no success. If anybody is willing to take this further, please feel free to!

@vineetg3
Copy link

Hi @tvalentyn , do you think this should still be labelled good-first-issue? I am planning to take it up, but seems like this is a tough one.

@tvalentyn
Copy link
Contributor

Thanks all, yes, it might be a bit more involved altough I haven't looked very closely. At minimum we should probably defer this until we finish adding pandas2 support, the work @caneff is doing now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants