diff --git a/README.md b/README.md
index fb17535aaa..b0b72fdb6f 100644
--- a/README.md
+++ b/README.md
@@ -19,58 +19,86 @@ under the License.
# Apache DataFusion Comet
-Apache DataFusion Comet is an Apache Spark plugin that uses [Apache DataFusion](https://datafusion.apache.org/)
-as native runtime to achieve improvement in terms of query efficiency and query runtime.
+Apache DataFusion Comet is a high-performance accelerator for Apache Spark, built on top of the powerful
+[Apache DataFusion](https://datafusion.apache.org) query engine. Comet is designed to significantly enhance the
+performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the
+Spark ecosystem without requiring any code changes.
-Comet runs Spark SQL queries using the native DataFusion runtime, which is
-typically faster and more resource efficient than JVM based runtimes.
+# Benefits of Using Comet
-
+## Run Spark Queries at DataFusion Speeds
-Comet aims to support:
+Comet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights.
-- a native Parquet implementation, including both reader and writer
-- full implementation of Spark operators, including
- Filter/Project/Aggregation/Join/Exchange etc.
-- full implementation of Spark built-in expressions
-- a UDF framework for users to migrate their existing UDF to native
+The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format
+using a single executor with 8 cores. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html)
+for details of the environment used for these benchmarks.
-## Architecture
+When using Comet, the overall run time is reduced from 649 seconds to 440 seconds, a 1.5x speedup.
-The following diagram illustrates the architecture of Comet:
+Running the same queries with DataFusion standalone (without Spark) using the same number of cores results in a 3.9x
+speedup compared to Spark.
-
+Comet is not yet achieving full DataFusion speeds in all cases, but with future work we aim to provide a 2x-4x speedup
+for many use cases.
-## Current Status
+![](docs/source/_static/images/tpch_allqueries.png)
-The project is currently integrated into Apache Spark 3.2, 3.3, and 3.4.
+Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each TPC-H query.
-## Feature Parity with Apache Spark
+![](docs/source/_static/images/tpch_queries_compare.png)
-The project strives to keep feature parity with Apache Spark, that is,
-users should expect the same behavior (w.r.t features, configurations,
-query results, etc) with Comet turned on or turned off in their Spark
-jobs. In addition, Comet extension should automatically detect unsupported
-features and fallback to Spark engine.
+The following chart shows how much Comet currently accelerates each query from the benchmark. Performance optimization
+is an ongoing task, and we welcome contributions from the community to help achieve even greater speedups in the future.
-To achieve this, besides unit tests within Comet itself, we also re-use
-Spark SQL tests and make sure they all pass with Comet extension
-enabled.
+![](docs/source/_static/images/tpch_queries_speedup.png)
-## Supported Platforms
+These benchmarks can be reproduced in any environment using the documentation in the
+[Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage
+you to run your own benchmarks.
-Linux, Apple OSX (Intel and M1)
+## Use Commodity Hardware
-## Requirements
+Comet leverages commodity hardware, eliminating the need for costly hardware upgrades or
+specialized hardware accelerators, such as GPUs or FGPA. By maximizing the utilization of commodity hardware, Comet
+ensures cost-effectiveness and scalability for your Spark deployments.
-- Apache Spark 3.2, 3.3, or 3.4
-- JDK 8, 11 and 17 (JDK 11 recommended because Spark 3.2 doesn't support 17)
-- GLIBC 2.17 (Centos 7) and up
+## Spark Compatibility
-## Getting started
+Comet aims for 100% compatibility with all supported versions of Apache Spark, allowing you to integrate Comet into
+your existing Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness
+the benefits of Comet's acceleration capabilities without disrupting your Spark applications.
-See the [DataFusion Comet User Guide](https://datafusion.apache.org/comet/user-guide/installation.html) for installation instructions.
+## Tight Integration with Apache DataFusion
+
+Comet tightly integrates with the core Apache DataFusion project, leveraging its powerful execution engine. With
+seamless interoperability between Comet and DataFusion, you can achieve optimal performance and efficiency in your
+Spark workloads.
+
+## Active Community
+
+Comet boasts a vibrant and active community of developers, contributors, and users dedicated to advancing the
+capabilities of Apache DataFusion and accelerating the performance of Apache Spark.
+
+## Getting Started
+
+To get started with Apache DataFusion Comet, follow the
+[installation instructions](https://datafusion.apache.org/comet/user-guide/installation.html). Join the
+[DataFusion Slack and Discord channels](https://datafusion.apache.org/contributor-guide/communication.html) to connect
+with other users, ask questions, and share your experiences with Comet.
## Contributing
-See the [DataFusion Comet Contribution Guide](https://datafusion.apache.org/comet/contributor-guide/contributing.html)
-for information on how to get started contributing to the project.
+
+We welcome contributions from the community to help improve and enhance Apache DataFusion Comet. Whether it's fixing
+bugs, adding new features, writing documentation, or optimizing performance, your contributions are invaluable in
+shaping the future of Comet. Check out our
+[contributor guide](https://datafusion.apache.org/comet/contributor-guide/contributing.html) to get started.
+
+## License
+
+Apache DataFusion Comet is licensed under the Apache License 2.0. See the [LICENSE.txt](LICENSE.txt) file for details.
+
+## Acknowledgments
+
+We would like to express our gratitude to the Apache DataFusion community for their support and contributions to
+Comet. Together, we're building a faster, more efficient future for big data processing with Apache Spark.
diff --git a/docs/source/_static/images/tpch_allqueries.png b/docs/source/_static/images/tpch_allqueries.png
new file mode 100644
index 0000000000..a6788d5a40
Binary files /dev/null and b/docs/source/_static/images/tpch_allqueries.png differ
diff --git a/docs/source/_static/images/tpch_queries_compare.png b/docs/source/_static/images/tpch_queries_compare.png
new file mode 100644
index 0000000000..927680612f
Binary files /dev/null and b/docs/source/_static/images/tpch_queries_compare.png differ
diff --git a/docs/source/_static/images/tpch_queries_speedup.png b/docs/source/_static/images/tpch_queries_speedup.png
new file mode 100644
index 0000000000..fb417ff1db
Binary files /dev/null and b/docs/source/_static/images/tpch_queries_speedup.png differ
diff --git a/docs/source/contributor-guide/benchmark-results/2024-05-30/comet-8-exec-5-runs.json b/docs/source/contributor-guide/benchmark-results/2024-05-30/comet-8-exec-5-runs.json
new file mode 100644
index 0000000000..38142151d1
--- /dev/null
+++ b/docs/source/contributor-guide/benchmark-results/2024-05-30/comet-8-exec-5-runs.json
@@ -0,0 +1,201 @@
+{
+ "engine": "datafusion-comet",
+ "benchmark": "tpch",
+ "data_path": "/mnt/bigdata/tpch/sf100/",
+ "query_path": "../../tpch/queries",
+ "spark_conf": {
+ "spark.comet.explainFallback.enabled": "true",
+ "spark.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar",
+ "spark.comet.cast.allowIncompatible": "true",
+ "spark.executor.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar",
+ "spark.executor.memory": "8G",
+ "spark.comet.exec.shuffle.enabled": "true",
+ "spark.app.name": "DataFusion Comet Benchmark derived from TPC-H / TPC-DS",
+ "spark.driver.port": "36573",
+ "spark.sql.adaptive.coalescePartitions.enabled": "false",
+ "spark.app.startTime": "1716923498046",
+ "spark.comet.batchSize": "8192",
+ "spark.app.id": "app-20240528131138-0043",
+ "spark.serializer.objectStreamReset": "100",
+ "spark.app.initial.jar.urls": "spark://woody.lan:36573/jars/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar",
+ "spark.submit.deployMode": "client",
+ "spark.sql.autoBroadcastJoinThreshold": "-1",
+ "spark.comet.exec.all.enabled": "true",
+ "spark.eventLog.enabled": "false",
+ "spark.driver.host": "woody.lan",
+ "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false",
+ "spark.sql.warehouse.dir": "file:/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/spark-warehouse",
+ "spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager",
+ "spark.comet.exec.enabled": "true",
+ "spark.repl.local.jars": "file:///home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar",
+ "spark.executor.id": "driver",
+ "spark.master": "spark://woody:7077",
+ "spark.executor.instances": "8",
+ "spark.comet.exec.shuffle.mode": "auto",
+ "spark.sql.extensions": "org.apache.comet.CometSparkSessionExtensions",
+ "spark.driver.memory": "8G",
+ "spark.driver.extraClassPath": "/home/andy/git/apache/datafusion-comet/spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar",
+ "spark.rdd.compress": "True",
+ "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false",
+ "spark.cores.max": "8",
+ "spark.comet.enabled": "true",
+ "spark.app.submitTime": "1716923497738",
+ "spark.submit.pyFiles": "",
+ "spark.executor.cores": "1",
+ "spark.comet.parquet.io.enabled": "false"
+ },
+ "1": [
+ 32.121661901474,
+ 27.997092485427856,
+ 27.756758451461792,
+ 28.55236315727234,
+ 28.332542181015015
+ ],
+ "2": [
+ 18.269107580184937,
+ 16.200955629348755,
+ 16.194639682769775,
+ 16.745808839797974,
+ 16.59864115715027
+ ],
+ "3": [
+ 17.265466690063477,
+ 17.069786310195923,
+ 17.12887978553772,
+ 19.33678102493286,
+ 18.182055234909058
+ ],
+ "4": [
+ 8.367004156112671,
+ 8.172023296356201,
+ 8.023266077041626,
+ 8.350765228271484,
+ 8.258736610412598
+ ],
+ "5": [
+ 34.10048794746399,
+ 32.69314408302307,
+ 33.21383595466614,
+ 36.391114473342896,
+ 39.00048065185547
+ ],
+ "6": [
+ 3.1693499088287354,
+ 3.044705390930176,
+ 3.047694206237793,
+ 3.2817511558532715,
+ 3.274174928665161
+ ],
+ "7": [
+ 25.369214296340942,
+ 24.020941257476807,
+ 24.0787034034729,
+ 28.47402787208557,
+ 28.23443365097046
+ ],
+ "8": [
+ 40.06126809120178,
+ 39.828824281692505,
+ 45.250510454177856,
+ 44.406742572784424,
+ 48.98451232910156
+ ],
+ "9": [
+ 62.822797775268555,
+ 61.26328158378601,
+ 64.95581865310669,
+ 69.51708793640137,
+ 73.52380013465881
+ ],
+ "10": [
+ 20.55334782600403,
+ 20.546096324920654,
+ 20.57452392578125,
+ 22.84211039543152,
+ 23.724371671676636
+ ],
+ "11": [
+ 11.068235158920288,
+ 10.715423822402954,
+ 11.353424310684204,
+ 11.37632942199707,
+ 11.530814170837402
+ ],
+ "12": [
+ 10.264788389205933,
+ 8.67864990234375,
+ 8.845952033996582,
+ 8.593009233474731,
+ 8.540803909301758
+ ],
+ "13": [
+ 9.603406190872192,
+ 9.648627042770386,
+ 13.040799140930176,
+ 10.154011249542236,
+ 9.716034412384033
+ ],
+ "14": [
+ 6.20926308631897,
+ 6.0385496616363525,
+ 7.674488544464111,
+ 10.53052043914795,
+ 7.661675691604614
+ ],
+ "15": [
+ 11.466301918029785,
+ 11.473632097244263,
+ 11.279382228851318,
+ 13.291078329086304,
+ 12.81026816368103
+ ],
+ "16": [
+ 8.096073865890503,
+ 7.73410701751709,
+ 7.742897272109985,
+ 8.477537631988525,
+ 7.821273326873779
+ ],
+ "17": [
+ 43.69264578819275,
+ 43.33040428161621,
+ 46.291987657547,
+ 54.654345989227295,
+ 54.37124800682068
+ ],
+ "18": [
+ 27.205485105514526,
+ 26.785916090011597,
+ 27.331408262252808,
+ 29.946768760681152,
+ 28.037617444992065
+ ],
+ "19": [
+ 8.100102186203003,
+ 7.845783472061157,
+ 8.52329158782959,
+ 8.907397985458374,
+ 9.13755488395691
+ ],
+ "20": [
+ 13.09695029258728,
+ 12.683861255645752,
+ 15.612725019454956,
+ 13.361177206039429,
+ 16.614356517791748
+ ],
+ "21": [
+ 43.69623780250549,
+ 43.26758122444153,
+ 46.91650056838989,
+ 47.875754833221436,
+ 57.9763662815094
+ ],
+ "22": [
+ 4.5090577602386475,
+ 4.420571804046631,
+ 4.639787673950195,
+ 5.118046998977661,
+ 5.017346143722534
+ ]
+}
\ No newline at end of file
diff --git a/docs/source/contributor-guide/benchmark-results/2024-05-30/datafusion-python-8-cores.json b/docs/source/contributor-guide/benchmark-results/2024-05-30/datafusion-python-8-cores.json
new file mode 100644
index 0000000000..f032d536df
--- /dev/null
+++ b/docs/source/contributor-guide/benchmark-results/2024-05-30/datafusion-python-8-cores.json
@@ -0,0 +1,73 @@
+{
+ "engine": "datafusion-python",
+ "datafusion-version": "38.0.1",
+ "benchmark": "tpch",
+ "data_path": "/mnt/bigdata/tpch/sf100/",
+ "query_path": "../../tpch/queries/",
+ "1": [
+ 7.410699844360352
+ ],
+ "2": [
+ 2.966364622116089
+ ],
+ "3": [
+ 3.988652467727661
+ ],
+ "4": [
+ 1.8821499347686768
+ ],
+ "5": [
+ 6.957948684692383
+ ],
+ "6": [
+ 1.779731273651123
+ ],
+ "7": [
+ 14.559604167938232
+ ],
+ "8": [
+ 7.062309265136719
+ ],
+ "9": [
+ 14.908353805541992
+ ],
+ "10": [
+ 7.73533296585083
+ ],
+ "11": [
+ 2.346423387527466
+ ],
+ "12": [
+ 2.7248904705047607
+ ],
+ "13": [
+ 6.38663387298584
+ ],
+ "14": [
+ 2.4675676822662354
+ ],
+ "15": [
+ 4.799000024795532
+ ],
+ "16": [
+ 1.9091999530792236
+ ],
+ "17": [
+ 19.230653762817383
+ ],
+ "18": [
+ 25.15683078765869
+ ],
+ "19": [
+ 4.2268781661987305
+ ],
+ "20": [
+ 8.66620659828186
+ ],
+ "21": [
+ 17.696006059646606
+ ],
+ "22": [
+ 1.3805692195892334
+ ]
+}
\ No newline at end of file
diff --git a/docs/source/contributor-guide/benchmark-results/2024-05-30/spark-8-exec-5-runs.json b/docs/source/contributor-guide/benchmark-results/2024-05-30/spark-8-exec-5-runs.json
new file mode 100644
index 0000000000..012b05c3ad
--- /dev/null
+++ b/docs/source/contributor-guide/benchmark-results/2024-05-30/spark-8-exec-5-runs.json
@@ -0,0 +1,184 @@
+{
+ "engine": "datafusion-comet",
+ "benchmark": "tpch",
+ "data_path": "/mnt/bigdata/tpch/sf100/",
+ "query_path": "../../tpch/queries",
+ "spark_conf": {
+ "spark.driver.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false",
+ "spark.sql.warehouse.dir": "file:/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/spark-warehouse",
+ "spark.app.id": "app-20240528090804-0041",
+ "spark.app.submitTime": "1716908883258",
+ "spark.executor.memory": "8G",
+ "spark.master": "spark://woody:7077",
+ "spark.executor.id": "driver",
+ "spark.executor.instances": "8",
+ "spark.app.name": "DataFusion Comet Benchmark derived from TPC-H / TPC-DS",
+ "spark.driver.memory": "8G",
+ "spark.rdd.compress": "True",
+ "spark.executor.extraJavaOptions": "-Djava.net.preferIPv6Addresses=false -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false",
+ "spark.serializer.objectStreamReset": "100",
+ "spark.cores.max": "8",
+ "spark.submit.pyFiles": "",
+ "spark.executor.cores": "1",
+ "spark.submit.deployMode": "client",
+ "spark.sql.autoBroadcastJoinThreshold": "-1",
+ "spark.eventLog.enabled": "false",
+ "spark.app.startTime": "1716908883579",
+ "spark.driver.port": "33725",
+ "spark.driver.host": "woody.lan"
+ },
+ "1": [
+ 76.91316103935242,
+ 79.55859923362732,
+ 81.10397529602051,
+ 79.01998662948608,
+ 79.1286551952362
+ ],
+ "2": [
+ 23.977370262145996,
+ 22.214473247528076,
+ 22.686659812927246,
+ 22.016682386398315,
+ 21.766324520111084
+ ],
+ "3": [
+ 22.700742721557617,
+ 21.980144739151,
+ 21.876065969467163,
+ 21.661516189575195,
+ 21.69345998764038
+ ],
+ "4": [
+ 17.377647638320923,
+ 16.249598264694214,
+ 16.15747308731079,
+ 16.128843069076538,
+ 16.04338026046753
+ ],
+ "5": [
+ 44.38863182067871,
+ 45.47764492034912,
+ 45.76063895225525,
+ 45.16393995285034,
+ 60.848369121551514
+ ],
+ "6": [
+ 3.2041075229644775,
+ 2.970944881439209,
+ 2.891291856765747,
+ 2.9719409942626953,
+ 3.0702600479125977
+ ],
+ "7": [
+ 24.369274377822876,
+ 24.684266567230225,
+ 24.146574020385742,
+ 24.023175716400146,
+ 30.56047773361206
+ ],
+ "8": [
+ 46.46081209182739,
+ 45.9838604927063,
+ 46.341185092926025,
+ 45.833823919296265,
+ 46.61182403564453
+ ],
+ "9": [
+ 67.67960548400879,
+ 67.34667444229126,
+ 70.34601259231567,
+ 71.24095153808594,
+ 84.38811421394348
+ ],
+ "10": [
+ 19.16477870941162,
+ 19.081010580062866,
+ 19.501060009002686,
+ 19.165698528289795,
+ 20.216782331466675
+ ],
+ "11": [
+ 17.158706426620483,
+ 17.05184030532837,
+ 17.714542150497437,
+ 17.004602909088135,
+ 17.700096130371094
+ ],
+ "12": [
+ 11.654477834701538,
+ 11.805298805236816,
+ 11.822469234466553,
+ 12.79678750038147,
+ 13.64478850364685
+ ],
+ "13": [
+ 20.430822372436523,
+ 20.18759250640869,
+ 21.26596975326538,
+ 21.234288454055786,
+ 20.189200162887573
+ ],
+ "14": [
+ 5.60215950012207,
+ 5.160705089569092,
+ 5.080057382583618,
+ 4.937625408172607,
+ 5.853632688522339
+ ],
+ "15": [
+ 14.17775845527649,
+ 13.898571729660034,
+ 14.215840578079224,
+ 14.316090106964111,
+ 14.356236457824707
+ ],
+ "16": [
+ 6.252386808395386,
+ 6.010213375091553,
+ 6.054978370666504,
+ 5.886059522628784,
+ 5.923115253448486
+ ],
+ "17": [
+ 71.41593313217163,
+ 70.25399804115295,
+ 72.07622528076172,
+ 72.27566242218018,
+ 72.20579051971436
+ ],
+ "18": [
+ 65.72738265991211,
+ 65.47461080551147,
+ 67.14260482788086,
+ 65.95489883422852,
+ 69.51795554161072
+ ],
+ "19": [
+ 7.1520891189575195,
+ 6.516514301300049,
+ 6.580992698669434,
+ 6.486274242401123,
+ 6.418147087097168
+ ],
+ "20": [
+ 12.619760036468506,
+ 12.235978126525879,
+ 12.116347551345825,
+ 12.161245584487915,
+ 12.30910348892212
+ ],
+ "21": [
+ 60.795483350753784,
+ 60.484593629837036,
+ 61.27316427230835,
+ 60.475560426712036,
+ 81.21473670005798
+ ],
+ "22": [
+ 8.926804065704346,
+ 8.113754034042358,
+ 8.029133796691895,
+ 7.99291467666626,
+ 8.439452648162842
+ ]
+}
\ No newline at end of file
diff --git a/docs/source/contributor-guide/benchmarking.md b/docs/source/contributor-guide/benchmarking.md
index 502b35c29b..3e9a61efbb 100644
--- a/docs/source/contributor-guide/benchmarking.md
+++ b/docs/source/contributor-guide/benchmarking.md
@@ -19,44 +19,87 @@ under the License.
# Comet Benchmarking Guide
-To track progress on performance, we regularly run benchmarks derived from TPC-H and TPC-DS. Benchmarking scripts are
-available in the [DataFusion Benchmarks](https://github.com/apache/datafusion-benchmarks) GitHub repository.
+To track progress on performance, we regularly run benchmarks derived from TPC-H and TPC-DS. Data generation and
+benchmarking documentation and scripts are available in the [DataFusion Benchmarks](https://github.com/apache/datafusion-benchmarks) GitHub repository.
-Here is an example command for running the benchmarks. This command will need to be adapted based on the Spark
-environment and location of data files.
+Here are example commands for running the benchmarks against a Spark cluster. This command will need to be
+adapted based on the Spark environment and location of data files.
-This command assumes that `datafusion-benchmarks` is checked out in a parallel directory to `datafusion-comet`.
+These commands are intended to be run from the `runners/datafusion-comet` directory in the `datafusion-benchmarks`
+repository.
+
+## Running Benchmarks Against Apache Spark
```shell
-$SPARK_HOME/bin/spark-submit \
- --master "local[*]" \
- --conf spark.driver.memory=8G \
- --conf spark.executor.memory=64G \
- --conf spark.executor.cores=16 \
- --conf spark.cores.max=16 \
- --conf spark.eventLog.enabled=true \
- --conf spark.sql.autoBroadcastJoinThreshold=-1 \
- --jars $COMET_JAR \
- --conf spark.driver.extraClassPath=$COMET_JAR \
- --conf spark.executor.extraClassPath=$COMET_JAR \
- --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
- --conf spark.comet.enabled=true \
- --conf spark.comet.exec.enabled=true \
- --conf spark.comet.exec.all.enabled=true \
- --conf spark.comet.cast.allowIncompatible=true \
- --conf spark.comet.explainFallback.enabled=true \
- --conf spark.comet.parquet.io.enabled=false \
- --conf spark.comet.batchSize=8192 \
- --conf spark.comet.columnar.shuffle.enabled=false \
- --conf spark.comet.exec.shuffle.enabled=true \
- --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
- --conf spark.sql.adaptive.coalescePartitions.enabled=false \
- --conf spark.comet.shuffle.enforceMode.enabled=true \
- ../datafusion-benchmarks/runners/datafusion-comet/tpcbench.py \
- --benchmark tpch \
- --data /mnt/bigdata/tpch/sf100-parquet/ \
- --queries ../datafusion-benchmarks/tpch/queries
+$SPARK_HOME/bin/spark-submit \
+ --master $SPARK_MASTER \
+ --conf spark.driver.memory=8G \
+ --conf spark.executor.memory=32G \
+ --conf spark.executor.cores=8 \
+ --conf spark.cores.max=8 \
+ --conf spark.sql.autoBroadcastJoinThreshold=-1 \
+ tpcbench.py \
+ --benchmark tpch \
+ --data /mnt/bigdata/tpch/sf100/ \
+ --queries ../../tpch/queries \
+ --iterations 5
```
-Comet performance can be compared to regular Spark performance by running the benchmark twice, once with
-`spark.comet.enabled` set to `true` and once with it set to `false`.
\ No newline at end of file
+## Running Benchmarks Against Apache Spark with Apache DataFusion Comet Enabled
+
+```shell
+$SPARK_HOME/bin/spark-submit \
+ --master $SPARK_MASTER \
+ --conf spark.driver.memory=8G \
+ --conf spark.executor.memory=64G \
+ --conf spark.executor.cores=8 \
+ --conf spark.cores.max=8 \
+ --conf spark.sql.autoBroadcastJoinThreshold=-1 \
+ --jars $COMET_JAR \
+ --conf spark.driver.extraClassPath=$COMET_JAR \
+ --conf spark.executor.extraClassPath=$COMET_JAR \
+ --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
+ --conf spark.comet.enabled=true \
+ --conf spark.comet.exec.enabled=true \
+ --conf spark.comet.exec.all.enabled=true \
+ --conf spark.comet.cast.allowIncompatible=true \
+ --conf spark.comet.explainFallback.enabled=true \
+ --conf spark.comet.parquet.io.enabled=false \
+ --conf spark.comet.batchSize=8192 \
+ --conf spark.comet.exec.shuffle.enabled=true \
+ --conf spark.comet.exec.shuffle.mode=auto \
+ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
+ --conf spark.sql.adaptive.coalescePartitions.enabled=false \
+ tpcbench.py \
+ --benchmark tpch \
+ --data /mnt/bigdata/tpch/sf100/ \
+ --queries ../../tpch/queries \
+ --iterations 5
+```
+
+## Current Performance
+
+Comet is not yet achieving full DataFusion speeds in all cases, but with future work we aim to provide a 2x-4x speedup
+for many use cases.
+
+The following benchmarks were performed on a Linux workstation with PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and
+data stored locally on NVMe storage. Performance characteristics will vary in different environments and we encourage
+you to run these benchmarks in your own environments.
+
+![](../../_static/images/tpch_allqueries.png)
+
+Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each TPC-H query.
+
+![](../../_static/images/tpch_queries_compare.png)
+
+The following chart shows how much Comet currently accelerates each query from the benchmark. Performance optimization
+is an ongoing task, and we welcome contributions from the community to help achieve even greater speedups in the future.
+
+![](../../_static/images/tpch_queries_speedup.png)
+
+The raw results of these benchmarks in JSON format is available here:
+
+- [Spark](./benchmark-results/2024-05-30/spark-8-exec-5-runs.json)
+- [Comet](./benchmark-results/2024-05-30/comet-8-exec-5-runs.json)
+- [DataFusion](./benchmark-results/2024-05-30/datafusion-python-8-cores.json)
+