-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: Enable comet tests with spark-4.0 profile #493
build: Enable comet tests with spark-4.0 profile #493
Conversation
assume(isSpark34Plus) | ||
// Spark 4.0 no longer fails for widening types | ||
// https://github.com/apache/spark/commit/3361f25dc0ff6e5233903c26ee105711b79ba967 | ||
assume(isSpark34Plus && !isSpark34Plus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean assume(isSpark34Plus && !isSpark40Plus)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Friendly ping @viirya @andygrove @comphead @huaxingao |
@@ -23,7 +23,6 @@ Comet provides the following configuration settings. | |||
|
|||
| Config | Description | Default Value | | |||
|--------|-------------|---------------| | |||
| spark.comet.ansi.enabled | Comet does not respect ANSI mode in most cases and by default will not accelerate queries when ansi mode is enabled. Enable this setting to test Comet's experimental support for ANSI mode. This should not be used in production. | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This highlights that our configs vary depending on the version we build against and we currently only publish one version of our configs.
Ideally we should generate configs per spark version and publish all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will file an issue for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I plan to associate this config with spark.sql.ansi.enabled
. So when Spark enables ANSI, so does Comet
|
||
trait ShimCometConf { | ||
protected val COMET_SCHEMA_EVOLUTION_ENABLED_DEFAULT = true | ||
protected val COMET_ANSI_MODE_ENABLED_DEFAULT = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to enable this by default when we know that we are not compatible yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we have to do this otherwise all plans will currently fall back to Spark because ANSI mode is enabled by default in Spark 4.
We should really only fall back to Spark for expressions that are actually affected by ANSI support rather than fall back for everything.
Perhaps we could add a note to the documentation explaining that ANSI mode in Spark 4 is not fully compatible yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking to use Spark-4.0 as a test bed for ANSI readiness. I will send another PR for enabling Spark tests and if I find any incompatibility, I will disable them per function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need it to be enabled for Spark tests, isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @kazuyukitanimura
I tihnk we should add a note to the documentation saying that we support Spark 4 now and also point out that ANSI mode is not yet compatible. I think it would be fine to do that as a follow-up PR.
package org.apache.comet.shims | ||
|
||
trait ShimCometConf { | ||
protected val COMET_SCHEMA_EVOLUTION_ENABLED_DEFAULT = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need a trailing _DEFAULT in the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say so as we have COMET_SCHEMA_EVOLUTION_ENABLED
in conf. Also this is default so that users can overwrite.
Thank you @andygrove I plan to send a few more PRs on Spark-4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @kazuyukitanimura great PR
@@ -382,12 +384,13 @@ object CometConf { | |||
.createWithDefault(Seq("Range,InMemoryTableScan")) | |||
|
|||
val COMET_ANSI_MODE_ENABLED: ConfigEntry[Boolean] = conf("spark.comet.ansi.enabled") | |||
.internal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why making it internal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #493 (comment)
doc is not per profile yet. By making it internal, doc will ignore it.
.internal() | ||
.doc( | ||
"Comet does not respect ANSI mode in most cases and by default will not accelerate " + | ||
"queries when ansi mode is enabled. Enable this setting to test Comet's experimental " + | ||
"support for ANSI mode. This should not be used in production.") | ||
.booleanConf | ||
.createWithDefault(false) | ||
.createWithDefault(COMET_ANSI_MODE_ENABLED_DEFAULT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to mention it is enabled by default for Spark 4.0 now. It could produce incorrect results so we better don't make it implicitly enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, Spark 4.0 support is not ready at all. I will send follow up PRs
val zero = Literal.default(expression.dataType) | ||
If(EqualTo(expression, zero), Literal.create(null, expression.dataType), expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullIf
implementation was changed to use With
that Comet does not support apache/spark#43623
Thank you @viirya @andygrove @comphead @parthchandra |
## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR enables the comet tests with the spark-4.0 profile ## How are these changes tested? Tests with the spark-4.0 profile now should pass. (But Spark tests do not yet)
Which issue does this PR close?
Part of #372
Rationale for this change
To be ready for Spark 4.0
What changes are included in this PR?
This PR enables the comet tests with the spark-4.0 profile
How are these changes tested?
Tests with the spark-4.0 profile now should pass. (But Spark tests do not yet)