[SPARK-50310][PYTHON] Apply a flag to disable DataFrameQueryContext for PySpark #48964

itholic · 2024-11-26T05:20:17Z

What changes were proposed in this pull request?

We disabled the DataFrameQueryContext from #48827, and we also need applying a flag for PySpark for the same performant reason.

Why are the changes needed?

To avoid the performance slowdown for the case when the DataFrameQueryContext too much stacked

Does this PR introduce any user-facing change?

No API changes, but the DataFrameQueryContext would no longer displayed when the flag is disabled

How was this patch tested?

Manually tested:

FLAG ON (almost 25sec)

>>> spark.conf.get("spark.sql.dataFrameQueryContext.enabled")
'true'
>>> import time
>>> import pyspark.sql.functions as F
>>>
>>> c = F.col("name")
>>> start = time.time()
>>> for i in range(10000):
...   _ = c.alias("a")
...
>>> print(time.time() - start)
24.78217577934265

FLAG OFF (only 1 sec)

>>> spark.conf.set("spark.sql.dataFrameQueryContext.enabled", "false")
>>> spark.conf.get("spark.sql.dataFrameQueryContext.enabled")
'false'
>>> import time
>>> import pyspark.sql.functions as F
>>>
>>> c = F.col("name")
>>> start = time.time()
>>> for i in range(10000):
...   _ = c.alias("a")
...
>>> print(time.time() - start)
1.0222370624542236

Was this patch authored or co-authored using generative AI tooling?

… PySpark

HyukjinKwon · 2024-11-26T06:30:35Z

python/pyspark/errors/utils.py

+        if spark is not None:
+            _enable_debugging_cache = (
+                spark.conf.get(
+                    "spark.sql.dataFrameQueryContext.enabled", "true"  # type: ignore[union-attr]


spark.sql.dataFrameQueryContext.enabled has to be StaticSQLConf

I see. Should we update the existing conf to static conf or should we add a new one for PySpark specific?

The existing conf here is defined as SQLConf below:

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Lines 5223 to 5230 in e55511c

val DATA_FRAME_QUERY_CONTEXT_ENABLED = buildConf("spark.sql.dataFrameQueryContext.enabled")

.internal()

.doc(

"Enable the DataFrame query context. This feature is enabled by default, but has a " +

"non-trivial performance overhead because of the stack trace collection.")

.version("4.0.0")

.booleanConf

.createWithDefault(true)

xinrong-meng · 2024-11-26T06:37:33Z

A comment for the PR description #48827 doesn’t seem to disable “DataFrameQueryContext” if I understand correctly :)

itholic · 2024-11-27T01:58:16Z

A comment for the PR description #48827 doesn’t seem to disable “DataFrameQueryContext” if I understand correctly :)

Yes it doesn't disable the DataFrameQueryContext but just provide an option to disable the DataFrameQueryContext to PySpark users. Default behavior still collects the stack traces. Please let me know if I don't get the context clearly.

[SPARK-50310][PYTHON] Add a flag to disable DataFrameQueryContext for…

6fa6329

… PySpark

itholic requested review from ueshin and HyukjinKwon November 26, 2024 05:20

github-actions bot added the PYTHON label Nov 26, 2024

itholic changed the title ~~[SPARK-50310][PYTHON] Add a flag to disable DataFrameQueryContext for PySpark~~ [SPARK-50310][PYTHON] Apply a flag to disable DataFrameQueryContext for PySpark Nov 26, 2024

HyukjinKwon reviewed Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50310][PYTHON] Apply a flag to disable DataFrameQueryContext for PySpark #48964

[SPARK-50310][PYTHON] Apply a flag to disable DataFrameQueryContext for PySpark #48964

itholic commented Nov 26, 2024 •

edited

Loading

HyukjinKwon Nov 26, 2024

itholic Nov 27, 2024

HyukjinKwon Nov 27, 2024

xinrong-meng commented Nov 26, 2024

itholic commented Nov 27, 2024

	val DATA_FRAME_QUERY_CONTEXT_ENABLED = buildConf("spark.sql.dataFrameQueryContext.enabled")
	.internal()
	.doc(
	"Enable the DataFrame query context. This feature is enabled by default, but has a " +
	"non-trivial performance overhead because of the stack trace collection.")
	.version("4.0.0")
	.booleanConf
	.createWithDefault(true)

[SPARK-50310][PYTHON] Apply a flag to disable DataFrameQueryContext for PySpark #48964

Are you sure you want to change the base?

[SPARK-50310][PYTHON] Apply a flag to disable DataFrameQueryContext for PySpark #48964

Conversation

itholic commented Nov 26, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

HyukjinKwon Nov 26, 2024

Choose a reason for hiding this comment

itholic Nov 27, 2024

Choose a reason for hiding this comment

HyukjinKwon Nov 27, 2024

Choose a reason for hiding this comment

xinrong-meng commented Nov 26, 2024

itholic commented Nov 27, 2024

itholic commented Nov 26, 2024 •

edited

Loading