docs: Generate configuration guide in mvn build #349

andygrove · 2024-04-29T18:07:47Z

Which issue does this PR close?

Closes #315

Rationale for this change

We should publish our configuration settings so that users know about them.

What changes are included in this PR?

How are these changes tested?

Tested manually by running mvn clean package -DskipTests and then generating the documentation.

viirya · 2024-04-29T19:18:45Z

common/src/main/scala/org/apache/comet/CometConf.scala

+
+/**
+ * Utility for generating markdown documentation from the configs.
+ */


Would you like to put the usage (i.e., mvn clean package -DskipTests) in the code comment here?

Thanks. Added.

kazuyukitanimura · 2024-04-29T20:56:06Z

docs/source/user-guide/configs.md

+| spark.comet.columnar.shuffle.batch.size | Batch size when writing out sorted spill files on the native side. Note that this should not be larger than batch size (i.e., `spark.comet.batchSize`). Otherwise it will produce larger batches than expected in the native operator after shuffle. | 8192 |
+| spark.comet.columnar.shuffle.enabled | Force Comet to only use columnar shuffle for CometScan and Spark regular operators. If this is enabled, Comet native shuffle will not be enabled but only Arrow shuffle. By default, this config is false. | false |
+| spark.comet.columnar.shuffle.memory.factor | Fraction of Comet memory to be allocated per executor process for Comet shuffle. Comet memory size is specified by `spark.comet.memoryOverhead` or calculated by `spark.comet.memory.overhead.factor` * `spark.executor.memory`. By default, this config is 1.0. | 1.0 |
+| spark.comet.columnar.shuffle.spill.threshold | Number of rows to be spilled used for Comet columnar shuffle. For every configured number of rows, a new spill file will be created. Higher value means more memory requirement to buffer shuffle data before flushing to disk. As Comet uses columnar shuffle which is columnar format, higher value usually helps to improve shuffle data compression ratio. This is internal config for testing purpose or advanced tuning. By default, this config is Int.Max. | 2147483647 |


Wondering if it is ok to publish internal conf like this one spark.comet.columnar.shuffle.spill.threshold...?

Great point. Let me fix that.

I pushed an update to fix this.

kazuyukitanimura

Thank you LGTM

viirya · 2024-04-30T01:28:51Z

Merged. Thanks @andygrove @kazuyukitanimura

* initial config doc * Generate configuration guide as part of mvn package * formatting * scalafix * add maven usage to comment * do not publish internal configs * improve check for public configs

It looks like the `dev/bump-version.sh` script wasn't used.

andygrove added 5 commits April 29, 2024 10:10

initial config doc

ebb8bcd

Generate configuration guide as part of mvn package

9f51024

formatting

5751196

fix merge conflict

afd6d76

scalafix

93e5fb5

viirya reviewed Apr 29, 2024

View reviewed changes

add maven usage to comment

a2a32a5

viirya approved these changes Apr 29, 2024

View reviewed changes

kazuyukitanimura approved these changes Apr 29, 2024

View reviewed changes

andygrove added 2 commits April 29, 2024 16:56

do not publish internal configs

ad4eca3

improve check for public configs

64bab63

kazuyukitanimura approved these changes Apr 30, 2024

View reviewed changes

viirya approved these changes Apr 30, 2024

View reviewed changes

viirya merged commit 1865284 into apache:main Apr 30, 2024
28 checks passed

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024

fix: Expression wasn't updated (apache#349)

89bd0ca

It looks like the `dev/bump-version.sh` script wasn't used.

andygrove deleted the generate-config-docs branch December 3, 2024 04:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Generate configuration guide in mvn build #349

docs: Generate configuration guide in mvn build #349

andygrove commented Apr 29, 2024

viirya Apr 29, 2024

andygrove Apr 29, 2024

kazuyukitanimura Apr 29, 2024

andygrove Apr 29, 2024

andygrove Apr 29, 2024

kazuyukitanimura left a comment

viirya commented Apr 30, 2024

docs: Generate configuration guide in mvn build #349

docs: Generate configuration guide in mvn build #349

Conversation

andygrove commented Apr 29, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

viirya Apr 29, 2024

Choose a reason for hiding this comment

andygrove Apr 29, 2024

Choose a reason for hiding this comment

kazuyukitanimura Apr 29, 2024

Choose a reason for hiding this comment

andygrove Apr 29, 2024

Choose a reason for hiding this comment

andygrove Apr 29, 2024

Choose a reason for hiding this comment

kazuyukitanimura left a comment

Choose a reason for hiding this comment

viirya commented Apr 30, 2024