Skip to content

Commit

Permalink
Add a plugin overview page to the contributors guide
Browse files Browse the repository at this point in the history
  • Loading branch information
andygrove committed Apr 28, 2024
1 parent a340748 commit 30c20b2
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/contributor-guide/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ https://mail.openjdk.org/pipermail/hotspot-dev/2019-September/039429.html
Detecting the debugger
https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to%20let%20your%20app%20continue.&text=I%20know%20that%20those%20are,meant%20with%20my%20first%20phrase).

# Verbose debug
## Verbose debug

By default, Comet outputs the exception details specific for Comet.

Expand Down
50 changes: 50 additions & 0 deletions docs/source/contributor-guide/plugin_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Comet Plugin Overview

The entry point to Comet is the `org.apache.comet.CometSparkSessionExtensions` class, which can be registered with Spark by adding the following setting to the Spark configuration when launching `spark-shell` or `spark-submit`:

```
--conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions
```

On initialization, this class registers two physical plan optimization rules with Spark: `CometScanRule` and `CometExecRule`. These rules run whenever a query stage is being planned.

## CometScanRule

`CometScanRule` replaces any Parquet scans with Comet Parquet scan classes.

When the V1 data source API is being used, `FileSourceScanExec` is replaced with `CometScanExec`.

When the V2 data source API is being used, `BatchScanExec` is replaced with `CometBatchScanExec`.

## CometExecRule

`CometExecRule` attempts to transform a Spark physical plan into a Comet plan.

This rule traverses bottom-up from the original Spark plan and attempts to replace each node with a Comet equivalent. For example, a `ProjectExec` will be replaced by `CometProjectExec`.

When replacing a node, various checks are performed to determine if Comet can support the operator and its expressions. If an operator or expression is not supported by Comet then the reason will be stored in a tag on the underlying Spark node. Running `explain` on a query will show any reasons that prevented the plan from being executed natively in Comet. If any part of the plan is not supported in Comet then the original Spark plan will be returned.

Comet does not support partially replacing subsets of the plan because this would involve adding transitions to convert between row-based and columnar data between Spark operators and Comet operators and the overhead of this could outweigh the benefits of running parts of the plan natively in Comet.

Once the plan has been transformed, it is serialized into Comet protocol buffer format by the `QueryPlanSerde` class and this serialized plan is passed into the native code by `CometExecIterator`.

In the native code there is a `PhysicalPlanner` struct (in `planner.rs`) which converts the serialized plan into an Apache DataFusion physical plan. In some cases, Comet provides specialized physical operators and expressions to override the DataFusion versions to ensure compatibility with Apache Spark.
5 changes: 3 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,10 @@ as a native runtime to achieve improvement in terms of query efficiency and quer
:caption: Contributor Guide

Getting Started <contributor-guide/contributing>
Comet Plugin Overview <contributor-guide/plugin_overview>
Development Guide <contributor-guide/development>
Debugging Guide <contributor-guide/debugging>
Github and Issue Tracker <https://github.com/apache/datafusion-comet>
contributor-guide/development
contributor-guide/debugging

.. _toc.asf-links:
.. toctree::
Expand Down

0 comments on commit 30c20b2

Please sign in to comment.