Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Make ANSI fallback more granular #509

Closed
wants to merge 21 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jun 3, 2024

Which issue does this PR close?

Part of #313

Rationale for this change

Rather than fall back to Spark for all plans when ANSI mode is enabled, it would be better to only fall back to Spark if the plan involves any expressions that have ANSI-specific behavior that we do not yet support.

What changes are included in this PR?

  • Introduce CometEvalMode enum to use instead of string literals
  • Add shim code for getting eval mode for Round and Cast
  • Add code to fallback to Spark for Round in ANSI mode
  • Remove code that falls back to Spark for all plans if ANSI mode is enabled

How are these changes tested?

TBD

@andygrove
Copy link
Member Author

@kazuyukitanimura I have more work to do in this PR but wanted to make you aware of it since it is related to Spark 4 support.

@andygrove andygrove marked this pull request as ready for review June 4, 2024 14:09
@andygrove andygrove marked this pull request as draft June 4, 2024 14:10
@andygrove
Copy link
Member Author

I am looking into the test failures:

2024-06-04T17:46:08.5118601Z [info] - SPARK-39175: Query context of Cast should be serialized to executors when WSCG is off *** FAILED *** (168 milliseconds)
2024-06-04T17:47:32.8159859Z [info] - postgreSQL/float8.sql *** FAILED *** (2 seconds, 108 milliseconds)
2024-06-04T17:47:37.1749187Z [info] - postgreSQL/groupingsets.sql *** FAILED *** (4 seconds, 331 milliseconds)
2024-06-04T18:04:31.0221492Z [info] *** 3 TESTS FAILED ***

@andygrove andygrove marked this pull request as ready for review June 4, 2024 21:44
Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some minor comments.

* Expression evaluation modes.
* - LEGACY: the default evaluation mode, which is compliant to Hive SQL.
* - ANSI: a evaluation mode which is compliant to ANSI SQL standard.
* - TRY: a evaluation mode for `try_*` functions. It is identical to ANSI evaluation mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is conf.ansiEnabled ignored for try_* ? Maybe the comment can clarify.

evalMode match {
case CometEvalMode.LEGACY => ExprOuterClass.EvalMode.LEGACY
case CometEvalMode.TRY => ExprOuterClass.EvalMode.TRY
case CometEvalMode.ANSI => ExprOuterClass.EvalMode.ANSI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a 'catch all' case for when someone tries to change CometEvalMode and things don't work as planned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have added this.

case c @ Cast(child, dt, timeZoneId, _) =>
handleCast(child, inputs, dt, timeZoneId, evalMode(c))

case expr: Add if evalMode(expr) == CometEvalMode.ANSI && !cometAnsiEnabled =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move these to be next to the corresponding supported implementations so new developers can discover this case easily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has been refactored significantly since this comment. There is now a single match arm for ANSI fallback and there is an isUnsupportedAnsiExpr method which determines if the expression is one where we do not yet have ANSI support.

Copy link
Contributor

@vaibhawvipul vaibhawvipul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks neat!

withInfo(expr, ansiNotSupported)
None

case expr: Multiply if evalMode(expr) == CometEvalMode.ANSI && !cometAnsiEnabled =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a simple method containing all these checks together, instead of scattering them here.

Comment on lines -717 to -724
if (isANSIEnabled(conf)) {
if (COMET_ANSI_MODE_ENABLED.get()) {
logWarning("Using Comet's experimental support for ANSI mode.")
} else {
logInfo("Comet extension disabled for ANSI mode")
return plan
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Spark 3.x, I feel we still need this protection in case users enable ANSI mode as well as native execution.
If I understand correctly we have not set up CI yet with ANSI enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This current check will make all plans fall back to Spark if ANSI is enabled, so no native plans will run, unless COMET_ANSI_MODE_ENABLED is enabled.

The changes in this PR mean that we still have the same check but only if the plan actually contains any expressions that would be affected by enabling ANSI support and we still fall back to Spark by default unless COMET_ANSI_MODE_ENABLED is enabled.

The main risk with this PR is if we don't have the complete list of expressions that support ANSI mode.

Copy link
Member Author

@andygrove andygrove Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly we have not set up CI yet with ANSI enabled.

Well, we have the Spark 4 tests where ANSI is enabled and that is catching issues for sure.

I noticed that we were explicitly disabling ANSI mode in CometTestBase and I have removed that so we use the default ANSI mode for whatever Spark version we are running against. This should mean that all of our unit tests will now run with ANSI enabled when running against Spark 4+. Let's see how many issues this finds 😰

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly we have not set up CI yet with ANSI enabled.

It would be good to add a CI pipeline for Spark 3.4 with ANSI enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am close to finish my PR to enable the spark tests with Spark 4.0 I.e. ANSI enabled, and I am disabling expressions that failed those tests with ANSI enabled.

I would propose to hold this part of change for now. Perhaps after we fix all Spark 4.0 issues, we can backport the learning to Spark 3.4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I have moved this to draft for now.

@andygrove
Copy link
Member Author

andygrove commented Jun 5, 2024

This is a pretty interesting test failure:

- postgreSQL/groupingsets.sql *** FAILED *** (4 seconds, 331 milliseconds)
 postgreSQL/groupingsets.sql
 Expected "... 1       128
 NULL  1       1       0       1       [1
 NULL  1       1       0       1       2
 NULL  2       1       0       1       4
 NULL  2       1       0       1       8
 NULL  2       1       0       1       16
 NULL  2       1       0       1       32
 NULL  1       1       0       1       64
 NULL  1       1       0       1       128]", but got "...     1       128
 NULL  1       1       0       1       [-1673396112
NULL  2       1       0       1       -1673198320
 NULL  2       1       0       1       -596264016
 NULL  1       1       0       1       -596231120
 NULL  1       1       0       2       65102
 NULL  2       1       0       2       65102]"

I am assuming that this was due to us not supporting the ANSI overflow checks in SUM or AVG. We now fall back for those if ANSI is enabled.

@@ -69,7 +69,6 @@ abstract class CometTestBase
conf.set("spark.hadoop.fs.file.impl", classOf[DebugFilesystem].getName)
conf.set("spark.ui.enabled", "false")
conf.set(SQLConf.SHUFFLE_PARTITIONS, 10) // reduce parallelism in tests
conf.set(SQLConf.ANSI_ENABLED.key, "false")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer explicitly disable ANSI in our test suites.

case c @ Cast(child, dt, timeZoneId, _) =>
handleCast(child, inputs, dt, timeZoneId, evalMode(c))

case expr if isUnsupportedAnsiExpr(expr) && !CometConf.COMET_ANSI_MODE_ENABLED.get() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel !CometConf.COMET_ANSI_MODE_ENABLED.get() condition needs to be removed.

When COMET_ANSI_MODE_ENABLED=true, we still need to fallback to Spark if the expression is not ANSI compatible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's remove this. It was intended as as short term workaround but I agree that we don't really need it now that we have this fallback logic.

@@ -69,7 +69,7 @@ abstract class CometTestBase
conf.set("spark.hadoop.fs.file.impl", classOf[DebugFilesystem].getName)
conf.set("spark.ui.enabled", "false")
conf.set(SQLConf.SHUFFLE_PARTITIONS, 10) // reduce parallelism in tests
conf.set(SQLConf.ANSI_ENABLED.key, "false")
conf.set(CometConf.COMET_ANSI_MODE_ENABLED.key, "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about leaving COMET_ANSI_MODE_ENABLED as default value as well? I.e. false for 3.x, true for 4.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point

@andygrove andygrove marked this pull request as draft June 7, 2024 16:36
@andygrove andygrove closed this Jul 18, 2024
@andygrove andygrove deleted the ansi-fallback branch December 3, 2024 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants