You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed a special case while debugging test failures in #250.
org.apache.spark.sql.DataFrameWindowFunctionsSuite: SPARK-38237: require all cluster keys for child required distribution for window query:
== Physical Plan ==
*(3) Project [lead(key1, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3854, lead(value, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST ROWS B
ETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3855]
+- Window [lead(key1#3848, 1, null) windowspecdefinition(key1#3848, key2#3849, value#3850 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead(key1, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST RO
WS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3854, lead(value#3850, 1, null) windowspecdefinition(key1#3848, key2#3849, value#3850 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead(value, 1, NULL) OVER (PARTITION BY key1, k
ey2 ORDER BY value ASC NULLS FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3855], [key1#3848, key2#3849], [value#3850 ASC NULLS FIRST]
+- *(2) ColumnarToRow
+- CometSort [key1#3848, key2#3849, value#3850], [key1#3848 ASC NULLS FIRST, key2#3849 ASC NULLS FIRST, value#3850 ASC NULLS FIRST]
+- CometColumnarExchange hashpartitioning(key1#3848, key2#3849, 5), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=8988]
+- CometColumnarExchange hashpartitioning(key1#3848, 5), REPARTITION_BY_COL, CometColumnarShuffle, [plan_id=8987]
+- RowToColumnar
+- *(1) Project [_1#3841 AS key1#3848, _2#3842 AS key2#3849, _3#3843 AS value#3850]
+- *(1) LocalTableScan [_1#3841, _2#3842, _3#3843]
There is repeated shuffle operators existing in the query. Currently it fails by
[info] java.lang.UnsupportedOperationException: CometShuffleExchangeExec.doExecute should not be executed.
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.doExecute(CometShuffleExchangeExec.scala:169)
[info] at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195)
[info] at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
[info] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[info] at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
[info] at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD$lzycompute(CometShuffleExchangeExec.scala:98)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD(CometShuffleExchangeExec.scala:91)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.shuffleDependency$lzycompute(CometShuffleExchangeExec.scala:150)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.shuffleDependency(CometShuffleExchangeExec.scala:133)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.doExecuteColumnar(CometShuffleExchangeExec.scala:188)
It is cause the upper CometShuffleExchangeExec will call the bottom CometShuffleExchangeExec.doExecute because CometShuffleExchangeExec takes row inputs.
To fix it, although we can add a ColumnarToRow on top of the bottom CometShuffleExchangeExec. I don't think it is efficient as the snippet of shuffles has too many row-to-column/column-to-row conversions:
I think it will be more reasonable to skip such case for Comet shuffle.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
I noticed a special case while debugging test failures in #250.
org.apache.spark.sql.DataFrameWindowFunctionsSuite
:SPARK-38237: require all cluster keys for child required distribution for window query
:There is repeated shuffle operators existing in the query. Currently it fails by
It is cause the upper
CometShuffleExchangeExec
will call the bottomCometShuffleExchangeExec.doExecute
becauseCometShuffleExchangeExec
takes row inputs.To fix it, although we can add a
ColumnarToRow
on top of the bottomCometShuffleExchangeExec
. I don't think it is efficient as the snippet of shuffles has too many row-to-column/column-to-row conversions:I think it will be more reasonable to skip such case for Comet shuffle.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: