-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50310][PYTHON] Apply a flag to disable DataFrameQueryContext for PySpark #48964
base: master
Are you sure you want to change the base?
Conversation
if spark is not None: | ||
_enable_debugging_cache = ( | ||
spark.conf.get( | ||
"spark.sql.dataFrameQueryContext.enabled", "true" # type: ignore[union-attr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.dataFrameQueryContext.enabled
has to be StaticSQLConf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Should we update the existing conf to static conf or should we add a new one for PySpark specific?
The existing conf here is defined as SQLConf
below:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Lines 5223 to 5230 in e55511c
val DATA_FRAME_QUERY_CONTEXT_ENABLED = buildConf("spark.sql.dataFrameQueryContext.enabled") | |
.internal() | |
.doc( | |
"Enable the DataFrame query context. This feature is enabled by default, but has a " + | |
"non-trivial performance overhead because of the stack trace collection.") | |
.version("4.0.0") | |
.booleanConf | |
.createWithDefault(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so
A comment for the PR description #48827 doesn’t seem to disable “DataFrameQueryContext” if I understand correctly :) |
Yes it doesn't disable the DataFrameQueryContext but just provide an option to disable the DataFrameQueryContext to PySpark users. Default behavior still collects the stack traces. Please let me know if I don't get the context clearly. |
What changes were proposed in this pull request?
We disabled the DataFrameQueryContext from #48827, and we also need applying a flag for PySpark for the same performant reason.
Why are the changes needed?
To avoid the performance slowdown for the case when the DataFrameQueryContext too much stacked
Does this PR introduce any user-facing change?
No API changes, but the DataFrameQueryContext would no longer displayed when the flag is disabled
How was this patch tested?
Manually tested:
Was this patch authored or co-authored using generative AI tooling?