-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Initial support for Window function #599
Conversation
Co-authored-by: comphead <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about all the corner cases, but looks good
WindowFrameBound::Preceding(ScalarValue::UInt64(None)) | ||
} | ||
LowerFrameBoundStruct::Preceding(offset) => { | ||
let offset_value = if offset.offset < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can use abs()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Thanks!
@@ -541,6 +542,17 @@ class CometSparkSessionExtensions | |||
withInfo(s, Seq(info1, info2).flatten.mkString(",")) | |||
s | |||
|
|||
case w: WindowExec => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add isCometOperatorEnabled()
to enable/disable the feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I saw some of the isCometOperatorEnabled
are checked in CometSparkSessionExtensions
, but some of them are in QueryPlanSerde
. I followed HashAggregateExec
and added the check in QueryPlanSerde
withTable("t1") { | ||
val numRows = 10 | ||
spark | ||
.range(numRows) | ||
.selectExpr("id AS a", s"$numRows - id AS b") // Todo: Test Nulls | ||
.repartition(3) // Force repartition to test data will come to single partition | ||
.write | ||
.saveAsTable("t1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified a little with withParquetTable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to withParquetTable()
. Thanks
sort_exprs, | ||
window_frame.into(), | ||
&input_schema, | ||
false, // TODO: Ignore nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a follow up PR for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val aggregateFunctions = | ||
List("COUNT(_1)", "MAX(_1)", "MIN(_1)") // TODO: Test all the aggregates | ||
|
||
aggregateFunctions.foreach { function => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making tests generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @huaxingao
withSQLConf( | ||
CometConf.COMET_EXEC_SHUFFLE_ENABLED.key -> "true", | ||
SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> aqeEnabled) { | ||
withParquetTable((0 until 10).map(i => (i, 10 - i)), "t1") { // TODO: test nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File a ticket to support test nulls for window functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> aqeEnabled) { | ||
withParquetTable((0 until 10).map(i => (i, 10 - i)), "t1") { // TODO: test nulls | ||
val aggregateFunctions = | ||
List("COUNT(_1)", "MAX(_1)", "MIN(_1)") // TODO: Test all the aggregates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup to test all the aggregates functions for window clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have problems with other aggregates functions. I will look one by one.
For example:
sum: Error from DataFusion: Internal error: Builtin Sum will be removed
avg: org.apache.comet.CometNativeException: AvgAccumulator for (Int32 --> Float64)
also cc @andygrove @viirya |
case agg: AggregateExpression => | ||
Some(agg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to fall back to Spark for aggregate expressions that we do not support, rather than fail during query execution.
case agg: AggregateExpression => | |
Some(agg) | |
case agg: AggregateExpression => agg.aggregateFunction match { | |
case _: Min | _: Max | _: Count => Some(agg) | |
case _ => | |
withInfo(windowExpr, "Unsupported aggregate", expr) | |
None | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @huaxingao
@huaxingao We should also update the user guide to show that we now support window functions |
I will update the user doc to add window function support. |
Thanks everyone! |
* feat: initial support for Window function Co-authored-by: comphead <[email protected]> * fix style * fix style * address comments * abs()->unsigned_abs() * address comments --------- Co-authored-by: comphead <[email protected]>
Co-authored-by: comphead [email protected]
Co-authored-by: Huaxin Gao [email protected]
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
This PR has the initial changes for window function. Currently, only
Count
,Max
andMin
are supported. In the following PRs, we will add all the other aggregate functions support, built-in window function support, ignoreNulls support, ect.How are these changes tested?