Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use enum to represent CAST eval_mode in expr.proto #415

Merged
merged 6 commits into from
Jun 3, 2024

Conversation

prashantksharma
Copy link
Contributor

Which issue does this PR close?

Closes #361

Rationale for this change

The updates to expr.proto by using enums instead of strings, which prevents potential errors in type interpretation. Modifications in planner.rs and QueryPlanSerde.scala resolve issues found during compilation and style checks.

What changes are included in this PR?

  • expr.proto: Updated the eval_mode field from a string type to an enum type (EvalMode), with defined values LEGACY, TRY, and ANSI.
  • Rust Code: Modified ExprStruct::Cast(expr) in planner.rs to handle eval_mode as integers corresponding to the new protobuf enum. Updated the match statement to directly convert integers to the EvalMode Rust enum and added error handling for invalid integers.
  • QueryPlanSerde.scala:
    • Introduced the stringToEvalMode function, which converts input strings to ExprOuterClass.EvalMode enum values using a case-insensitive match, enhancing our error handling by throwing exceptions for invalid modes.
    • Corrected the use of toUpperCase to include Locale.ROOT to avoid locale-sensitive behavior, aligning with internationalization best practices.
    • Reorganized imports to comply with project's Scalastyle guidelines, clearly separating Java standard library imports from those specific to the project, which aids in code clarity and maintenance.
    • Addressed and resolved Scalastyle violations related to locale-specific string manipulations. Fixed the Scalastyle violation by correcting the usage of toUpperCase without specifying a locale. Reorganized import statements to adhere to the project's coding standards, ensuring that Java standard library imports are correctly grouped.

How are these changes tested?

I have run

  • make test-rust: No failures.
  • make test-jvm: No failures.

"ANSI" => EvalMode::Ansi,
"TRY" => EvalMode::Try,
"LEGACY" => EvalMode::Legacy,
let eval_mode = match expr.eval_mode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make use of the generated protobuf enum here rather than hard-code numberic values. This would also allow us to remove the "other" error handling case.

                let eval_mode = match spark_expression::EvalMode::try_from(expr.eval_mode)? {
                    spark_expression::EvalMode::Legacy => EvalMode::Legacy,
                    spark_expression::EvalMode::Try => EvalMode::Try,
                    spark_expression::EvalMode::Ansi => EvalMode::Ansi,
                };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, Andy.
I have made changes to planner.rs as per suggestion.

message Cast {
Expr child = 1;
DataType datatype = 2;
string timezone = 3;
// LEGACY, ANSI, or TRY
string eval_mode = 4;
// Depreciateid: LEGACY, ANSI, or TRY - preserved for backward compatibity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for backwards compatibility here since we have not published a release yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed string eval_mode.

@@ -1207,7 +1223,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {
.newBuilder()
.setChild(e)
.setDatatype(serializeDataType(IntegerType).get)
.setEvalMode("LEGACY") // year is not affected by ANSI mode
.setEvalMode(stringToEvalMode("LEGACY")) // year is not affected by ANSI mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use the enum directly here rather than convert from string.

Suggested change
.setEvalMode(stringToEvalMode("LEGACY")) // year is not affected by ANSI mode
// year is not affected by ANSI mode so we use LEGACY mode here
.setEvalMode(ExprOuterClass.EvalMode.LEGACY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for another great suggestion.
Updated accordingly to QueryPlanSerde.scala L1226.

case "ANSI" => ExprOuterClass.EvalMode.ANSI
case _ =>
throw new IllegalArgumentException(
"Invalid eval mode"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to include the invalid value in the error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code as follows.

   5   def stringToEvalMode(evalModeStr: String): ExprOuterClass.EvalMode =
   4     evalModeStr.toUpperCase(Locale.ROOT) match {
   3       case "LEGACY" => ExprOuterClass.EvalMode.LEGACY
   2       case "TRY" => ExprOuterClass.EvalMode.TRY
   1       case "ANSI" => ExprOuterClass.EvalMode.ANSI
 536       case invalid =>
   1         throw new IllegalArgumentException(
   2           s"Invalid eval mode '$invalid' "
   3         ) // Assuming we want to catch errors strictly
   4     }

@andygrove
Copy link
Member

@prashantksharma Do you need any assistance with this PR? Let me know if you have questions

@prashantksharma
Copy link
Contributor Author

@prashantksharma Do you need any assistance with this PR? Let me know if you have questions

@andygrove
Apologies, I will definitely finalize this PR based on your comments by May 31st, 2024.

PS: I have been little caught up with office work due to product launch at my current company.

@prashantksharma
Copy link
Contributor Author

@andygrove

Thank you so much for your suggestions and comments, and most importantly patience in helping me with this PR.

I have made the necessary changes and pushed to the same branch i.e issue-361-enum-for-eval_mode.

Kindly, let me know if there are any steps to be done for this PR from my side.

@prashantksharma prashantksharma marked this pull request as ready for review June 1, 2024 04:17
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI. Thanks @prashantksharma. There are more places in the Scala code where we could take advantage of the new enum but I think we can handle that as a follow on PR as this one achieves the main goal of using enum in the protobuf definition.

@andygrove
Copy link
Member

@viirya @kazuyukitanimura @parthchandra @huaxingao I plan on merging this one soon unless you want to review?

@andygrove andygrove merged commit 3a83133 into apache:main Jun 3, 2024
43 checks passed
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
* Fixes Issue apache#361: Use enum to represent CAST eval_mode in expr.proto

* Update expr.proto and QueryPlanSerde.scala for handling enum EvalMode for cast message

* issue 361 fixed type issue for eval_mode in planner.rs

* issue 361 refactored QueryPlanSerde.scala for enhanced type safety and localization compliance, including a new string-to-enum conversion function and improved import organization.

* Updated planner.rs, expr.proto, QueryPlanSerde.scala for enum support

* Update spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala

---------

Co-authored-by: Prashant K. Sharma <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chore: Use enum to represent CAST eval_mode in expr.proto
4 participants