feat: Add extended explain info to Comet plan #255

parthchandra · 2024-04-10T21:36:07Z

Which issue does this PR close?

Closes #253

Rationale for this change

Adds addition planning information to a Spark plan which allows us to track the reasons why a Spark plan was not fully converted to Comet.

What changes are included in this PR?

This PR adds a CometExplainInfo structure that is returned by every step of the planning process. Eventually the Structure is attached to the Spark plan being returned by adding it as a Comet specific tag

How are these changes tested?

Additional unit test and tested with all TPCH and TPCDS queries with both Spark 3.4 and Spark 3.4 with the extended plan support.
Also regenerated the CometTPC*QueriesList output for both TPCH and TPCDS run against a 1TB dataset.
Note: Since the extended plan support does not exist before Spark 4, the output of CometTPC*QueriesList was updated with a build of Spark 3.4 with the feature backported.

andygrove · 2024-04-10T21:55:26Z

spark/src/main/scala/org/apache/comet/CometExplainInfo.scala

+ * under the License.
+ */
+
+package org.apache.comet


Unrelated to this PR, but we should probably be using org.apache.datafusion.comet once the repositories get renamed?

Agree. But this won't be the only class needing that change. So perhaps better to wait until then and do one bulk update?

Yes, absolutely. The move will likely happen next week, so let's discuss after that.

andygrove · 2024-04-10T22:44:32Z

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

+            val info1 = if (isSchemaSupported(requiredSchema)) {
+              CometExplainInfo(s"Schema $requiredSchema is not supported")
+            } else {
+              CometExplainInfo.none
+            }


We could use Scala's Option type here rather than defining our own none constant. For example:

val info1 = if (isSchemaSupported(requiredSchema)) { Some(CometExplainInfo(s"Schema $requiredSchema is not supported")) } else { None }

We would also need to update the call to pass the list of reasons to opWithInfo to use flatten so that we only pass the valid reasons.

opWithInfo(scanExec, CometExplainInfo("SCAN", Seq(info1, info2, info3).flatten))

Good idea. I may have a cleaner way of doing this thoguh, which would make this unnecessary. Give me a little time since I need to change a lot of places to try out the idea. If that doesn't work, I'll change this to use Scala Option

andygrove · 2024-04-10T22:48:39Z

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

+              CometTakeOrderedAndProjectExec
+                .isSupported(s)
+                ._1 =>
+          // TODO: support offset for Spark 3.4


Do we need to file an issue to track this?

This got copied from some other branch so I've removed it. I'm not too sure what else we need to support for CometTakeOrderedAndProjectExec but let me find out and log an issue

andygrove · 2024-04-10T22:54:55Z

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

+            case b: CometExec => b
+            case b: CometBroadcastExchangeExec => b
+            case b: CometShuffleExchangeExec => b


very minor nit: we could combine these if we want to and also avoid the extra variable b.

Suggested change

case b: CometExec => b

case b: CometBroadcastExchangeExec => b

case b: CometShuffleExchangeExec => b

case _: CometExec | _: CometBroadcastExchangeExec | _: CometShuffleExchangeExec => op

dbtsai · 2024-04-10T23:49:30Z

spark/src/main/scala/org/apache/comet/ExtendedExplainInfo.scala

+      if (s.nonEmpty) {
+        info = info :+ s
+      }
+    })


For longer lambda expression, we can ignore parentheses. Why do we have a map(t => t)?

sorted.foreach { p => val s = getActualPlan(p).getTagValue(CometExplainInfo.EXTENSION_INFO).map(t => t).getOrElse("") if (s.nonEmpty) { info = info :+ s } }

dbtsai · 2024-04-10T23:52:48Z

spark/src/main/scala/org/apache/comet/ExtendedExplainInfo.scala

+            case _: SparkPlan => traversed.enqueue(getActualPlan(c.asInstanceOf[SparkPlan]))
+            case _ =>
+          }
+          ()


ditto, remove the unneeded parentheses.

parthchandra · 2024-04-11T16:09:17Z

Looking at the ci failures, will address the comments as well.

parthchandra · 2024-04-18T00:42:06Z

@andygrove I changed the core of the implementation. Instead of setting information in a CometExplainInfo structure and bubbling it up in the plan, I now set the explain information as a spark tag in the plan or expression. This makes the changes easier to implement as we add support for more operators and expressions.
Please take a look.

advancedxy

Did a quick review, this is neat work. Thanks.

IIUC, the required Spark feature is SPARK-47289? I think it would be helpful to put this link in the PR summary or commit message, so that users know which PR/patch to apply to their own 3.x Spark.

advancedxy · 2024-04-18T06:10:00Z

spark/inspections/CometTPCHQueriesList-results.txt

+Query: q2 TPCH Snappy. Comet Exec: Enabled (CometSort, CometSortMergeJoin, CometProject, CometFilter)
+Query: q2 TPCH Snappy: ExplainInfo:
+ObjectHashAggregate is not supported
+might_contain is not supported


hmm, might_contain(org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain) should already been supported in #179. Its child expression might be a XXHash64, which is not supported yet.

I'm wondering if this output is a false positive? However I cannot reproduce the might_contain case in my local 1GB TPCH runs.

Great catch! I was catching this but in FilterExec I was not propagating the correct error. Fixed.

advancedxy · 2024-04-18T06:11:01Z

spark/inspections/CometTPCHQueriesList-results.txt

+BroadcastHashJoin is not supported
+SortMergeJoin is not supported


BroadcastHashJoin and SortMergeJoin should also been supported? There should be some more specific reason why these are not supported?

Another good catch! I reviewed everything and found a few more paths where the error was not being caught. Fixed.

advancedxy · 2024-04-18T06:13:56Z

spark/src/test/scala/org/apache/spark/sql/CometTPCQueryListBase.scala

@@ -98,6 +105,11 @@ trait CometTPCQueryListBase
        } else {
          out.println(s"Query: $name$nameSuffix. Comet Exec: Disabled")
        }
+        if (supportsExtendedExplainInfo(df.queryExecution)) {


I think the if should be removed? Otherwise, running with Vanilla Spark 3.x, the ExplainInfo is not printed.

(P.S: I ran the TPCH query and found no ExplainInfo with open source Spark 3.4).

Good point. Done

andygrove · 2024-04-18T12:46:13Z

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

+            var info1: Option[String] = None
+            if (isSchemaSupported(requiredSchema)) {
+              info1 = Some(s"Schema $requiredSchema is not supported")
+            }


I wonder if we could add a helper method to reduce the boilerplate for each of these sections for creating the Option[String].

We could add a method like this:

def createMessage(condition: Boolean, message: => String): Option[String] = { if (condition) { Some(message) } else { None } }

The call site could then be a one-liner:

val info1 = createMessage(isSchemaSupported(requiredSchema), s"Schema $requiredSchema is not supported")

What do you think?

edit: I just updated the suggested code to use the pass-by-name message: => String syntax so that the message is evaluated lazily

andygrove · 2024-04-18T12:49:27Z

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

+   * @return
+   *   The node with information (if any) attached
+   */
+  def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = {


I love the idea of storing the metadata on the plan using Spark's tagging mechanism (we did the same thing in Spark RAPIDS).

parthchandra · 2024-04-19T03:05:44Z

@advancedxy, @andygrove Addressed your comments. I was missing a few paths where the explanation was not being propagated correctly. I've fixed the ones I could find.
Also, updated the results in CometTPC*QueriesList. I had run this previously on a 1TB dataset and not run these after I rebased on main. I've run it again on a 1GB dataset but had to explicitly set some Bloom filter configuration options to simulate the conditions of the larger data set. Otherwise the settings are the same as the defaults we expect most users to have.

andygrove

LGTM. Thanks @parthchandra

advancedxy

LGTM

parthchandra · 2024-04-19T16:05:34Z

Looking into the ci failures ...

parthchandra · 2024-04-20T17:43:18Z

@andygrove could you please trigger a re-run of the failed ci workflow? (Only committers can do that). I cannot reproduce the failure locally and am pretty sure it has nothing to do with this PR.

viirya · 2024-04-20T18:32:13Z

Re-triggered.

parthchandra · 2024-04-21T21:21:52Z

Could we merge this please? @viirya @andygrove

viirya · 2024-04-21T23:25:36Z

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

+              s"Schema $requiredSchema is not supported")
+            val info2 = createMessage(
+              !isSchemaSupported(partitionSchema),
+              s"Schema $partitionSchema is not supported")


Suggested change

s"Schema $partitionSchema is not supported")

s"Partition schema $partitionSchema is not supported")

viirya · 2024-04-21T23:31:36Z

spark/src/main/scala/org/apache/comet/ExtendedExplainInfo.scala

+    info.distinct.mkString("\n").trim
+  }
+
+  private def getActualPlan(node: TreeNode[_]): TreeNode[_] = {


Actually this only works on SparkPlan, I'm wondering why we use TreeNode[_] in many places?

I used TreeNode because we can add the extended info tag to any TreeNode (SparkPlan, Expression, or AggregateExpression). This particular method only operates on SparkPlan, but I couldn't get the compiler to agree with me :( so I finally left it as a TreeNode.

viirya · 2024-04-21T23:33:40Z

spark/src/main/scala/org/apache/comet/ExtendedExplainInfo.scala

+    info.filter(!_.contentEquals("\n"))
+  }
+
+  // get all plan nodes, breadth first, leaf nodes first


breadth first and leaf nodes first seem conflicting?

Reversed breadth first would be a more accurate description I suppose. The traversal is BF but the order is reversed at the end.

viirya · 2024-04-21T23:34:58Z

spark/src/main/scala/org/apache/comet/shims/ShimCometSparkSessionExtensions.scala

+
+  def supportsExtendedExplainInfo(qe: QueryExecution): Boolean = {
+    try {
+      // Look for QueryExecution.extendedExplainInfo(scala.Function1[String, Unit], SparkPlan)


Can we add clear reference here that when it is added to Spark, e.g., Spark xx+?

viirya · 2024-04-21T23:36:28Z

spark/src/main/scala/org/apache/spark/sql/ExtendedExplainGenerator.scala

+/**
+ * A trait for a session extension to implement that provides addition explain plan information.
+ */
+


Suggested change

/**

* A trait for a session extension to implement that provides addition explain plan information.

*/

/**

* A trait for a session extension to implement that provides addition explain plan information.

* We copy this from Spark 4.0 since this trait is not available in Spark 3.x. We can remove this

* after dropping Spark 3.x support.

*/

viirya · 2024-04-21T23:38:19Z

spark/src/test/scala/org/apache/spark/sql/CometTPCQueryListBase.scala

+        "spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold" -> "1MB",
+        "spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold" -> "1MB") {


Why we need these bloomFilter configs?

This allows us to simulate the plan produced at larger loads. When we run this on a 1TB dataset, we get bloom filters enabled because the thresholds are met. However for smaller datasets, we need to lower the thresholds so that bloom filters are enabled.
Added this as a comment.

viirya · 2024-04-21T23:40:47Z

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala

@@ -917,6 +1026,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {
                .setStringSpace(builder)
                .build())
          } else {
+            withInfo(expr, null, child)


Can we override this method for this kind of usage so we can do withInfo(expr, child)? There are a lot of withInfo(expr, null, ...) calls.

Can be in a follow up PR.

Done. There are one or two cases where the second param is not null. But mostly it is null.

viirya · 2024-04-21T23:43:01Z

There is conflict needed to be resolved. And I have a few minor comments like code comment changes.

advancedxy · 2024-04-22T08:43:59Z

could you please trigger a re-run of the failed ci workflow? (Only committers can do that). I cannot reproduce the failure locally and am pretty sure it has nothing to do with this PR.

BTW, you can close and re-open the PR to re-trigger the CI workflow. Or you can push an empty commit to re-trigger.

Requires Spark 4.0.0 for the explain info to be visible in Spark UI. (see: https://issues.apache.org/jira/browse/SPARK-47289)

viirya · 2024-04-22T18:02:17Z

Thank you @parthchandra. I will merge this once CI passes.

parthchandra · 2024-04-22T19:34:49Z

Thanks for the review @viirya @andygrove @advancedxy

viirya · 2024-04-22T19:51:50Z

Merged. Thanks all.

* feat: Add extended explain info to Comet plan Requires Spark 4.0.0 for the explain info to be visible in Spark UI. (see: https://issues.apache.org/jira/browse/SPARK-47289) * spotless apply * Address review comments * fix ci * Add one more explanation * address review comments * fix formatting after rebase

parthchandra requested review from sunchao, viirya and andygrove April 10, 2024 21:36

parthchandra changed the title ~~Add extended explain info to Comet plan~~ feat: Add extended explain info to Comet plan Apr 10, 2024

andygrove reviewed Apr 10, 2024

View reviewed changes

dbtsai reviewed Apr 10, 2024

View reviewed changes

parthchandra force-pushed the comet-explain branch from 07eb3df to 9456f6c Compare April 18, 2024 00:36

advancedxy reviewed Apr 18, 2024

View reviewed changes

andygrove reviewed Apr 18, 2024

View reviewed changes

parthchandra force-pushed the comet-explain branch from 59ac574 to f361846 Compare April 19, 2024 02:55

andygrove approved these changes Apr 19, 2024

View reviewed changes

advancedxy approved these changes Apr 19, 2024

View reviewed changes

parthchandra force-pushed the comet-explain branch from 92ae83d to a8a228c Compare April 19, 2024 23:44

viirya reviewed Apr 21, 2024

View reviewed changes

viirya approved these changes Apr 21, 2024

View reviewed changes

parthchandra and others added 7 commits April 22, 2024 10:07

feat: Add extended explain info to Comet plan

3cb863e

Requires Spark 4.0.0 for the explain info to be visible in Spark UI. (see: https://issues.apache.org/jira/browse/SPARK-47289)

spotless apply

d48c6d5

Address review comments

88aa8ca

fix ci

d775bc7

Add one more explanation

81bba4f

address review comments

e2c2cfd

fix formatting after rebase

d1b948b

parthchandra force-pushed the comet-explain branch from b909fa8 to d1b948b Compare April 22, 2024 17:19

viirya merged commit 6d01f6a into apache:main Apr 22, 2024
28 checks passed

viirya mentioned this pull request Apr 22, 2024

fix: Iceberg scan transition should be in front of other data source v2 #302

Merged

		BroadcastHashJoin is not supported
		SortMergeJoin is not supported

	s"Schema $partitionSchema is not supported")
	s"Partition schema $partitionSchema is not supported")

		"spark.sql.optimizer.runtime.bloomFilter.creationSideThreshold" -> "1MB",
		"spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold" -> "1MB") {

feat: Add extended explain info to Comet plan #255

feat: Add extended explain info to Comet plan #255

Conversation

parthchandra commented Apr 10, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andygrove Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andygrove Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parthchandra commented Apr 11, 2024

parthchandra commented Apr 18, 2024

advancedxy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andygrove Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parthchandra commented Apr 19, 2024

andygrove left a comment

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

parthchandra commented Apr 19, 2024

parthchandra commented Apr 20, 2024

viirya commented Apr 20, 2024

parthchandra commented Apr 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Apr 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya commented Apr 21, 2024

advancedxy commented Apr 22, 2024

viirya commented Apr 22, 2024

parthchandra commented Apr 22, 2024

viirya commented Apr 22, 2024

parthchandra commented Apr 10, 2024 •

edited

Loading

andygrove Apr 10, 2024 •

edited

Loading

andygrove Apr 10, 2024 •

edited

Loading

andygrove Apr 18, 2024 •

edited

Loading

viirya Apr 21, 2024 •

edited

Loading