Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into hash_join_build_right
Browse files Browse the repository at this point in the history
  • Loading branch information
viirya committed Jun 7, 2024
2 parents bd3370c + 311e13e commit 1328114
Show file tree
Hide file tree
Showing 13 changed files with 299 additions and 157 deletions.
2 changes: 1 addition & 1 deletion core/src/execution/datafusion/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -567,7 +567,7 @@ impl PhysicalPlanner {
let child = self.create_expr(expr.child.as_ref().unwrap(), input_schema)?;
Ok(Arc::new(NotExpr::new(child)))
}
ExprStruct::Negative(expr) => {
ExprStruct::UnaryMinus(expr) => {
let child: Arc<dyn PhysicalExpr> =
self.create_expr(expr.child.as_ref().unwrap(), input_schema.clone())?;
let result = negative::create_negate_expr(child, expr.fail_on_error);
Expand Down
4 changes: 2 additions & 2 deletions core/src/execution/proto/expr.proto
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ message Expr {
CaseWhen caseWhen = 38;
In in = 39;
Not not = 40;
Negative negative = 41;
UnaryMinus unary_minus = 41;
BitwiseShiftRight bitwiseShiftRight = 42;
BitwiseShiftLeft bitwiseShiftLeft = 43;
IfExpr if = 44;
Expand Down Expand Up @@ -452,7 +452,7 @@ message Not {
Expr child = 1;
}

message Negative {
message UnaryMinus {
Expr child = 1;
bool fail_on_error = 2;
}
Expand Down
267 changes: 171 additions & 96 deletions docs/source/user-guide/expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,99 +19,174 @@

# Supported Spark Expressions

The following Spark expressions are currently available:

- Literals
- Arithmetic Operators
- UnaryMinus
- Add/Minus/Multiply/Divide/Remainder
- Conditional functions
- Case When
- If
- Cast
- Coalesce
- BloomFilterMightContain
- Boolean functions
- And
- Or
- Not
- EqualTo
- EqualNullSafe
- GreaterThan
- GreaterThanOrEqual
- LessThan
- LessThanOrEqual
- IsNull
- IsNotNull
- In
- String functions
- Substring
- Coalesce
- StringSpace
- Like
- Contains
- Startswith
- Endswith
- Ascii
- Bit_length
- Octet_length
- Upper
- Lower
- Chr
- Initcap
- Trim/Btrim/Ltrim/Rtrim
- Concat_ws
- Repeat
- Length
- Reverse
- Instr
- Replace
- Translate
- Bitwise functions
- Shiftright/Shiftleft
- Date/Time functions
- Year/Hour/Minute/Second
- Hash functions
- Md5
- Sha2
- Hash
- Xxhash64
- Math functions
- Abs
- Acos
- Asin
- Atan
- Atan2
- Cos
- Exp
- Ln
- Log10
- Log2
- Pow
- Round
- Signum
- Sin
- Sqrt
- Tan
- Ceil
- Floor
- Aggregate functions
- Count
- Sum
- Max
- Min
- Avg
- First
- Last
- BitAnd
- BitOr
- BitXor
- BoolAnd
- BoolOr
- CovPopulation
- CovSample
- VariancePop
- VarianceSamp
- StddevPop
- StddevSamp
- Corr
The following Spark expressions are currently available. Any known compatibility issues are noted in the following tables.

## Literal Values

| Expression | Notes |
| -------------------------------------- | ----- |
| Literal values of supported data types | |

## Unary Arithmetic

| Expression | Notes |
| ---------------- | ----- |
| UnaryMinus (`-`) | |

## Binary Arithmeticx

| Expression | Notes |
| --------------- | --------------------------------------------------- |
| Add (`+`) | |
| Subtract (`-`) | |
| Multiply (`*`) | |
| Divide (`/`) | |
| Remainder (`%`) | Comet produces `NaN` instead of `NULL` for `% -0.0` |

## Conditional Expressions

| Expression | Notes |
| ---------- | ----- |
| CaseWhen | |
| If | |

## Comparison

| Expression | Notes |
| ------------------------- | ----- |
| EqualTo (`=`) | |
| EqualNullSafe (`<=>`) | |
| GreaterThan (`>`) | |
| GreaterThanOrEqual (`>=`) | |
| LessThan (`<`) | |
| LessThanOrEqual (`<=`) | |
| IsNull (`IS NULL`) | |
| IsNotNull (`IS NOT NULL`) | |
| In (`IN`) | |

## String Functions

| Expression | Notes |
| --------------- | ----------------------------------------------------------------------------------------------------------- |
| Ascii | |
| BitLength | |
| Chr | |
| ConcatWs | |
| Contains | |
| EndsWith | |
| InitCap | |
| Instr | |
| Length | |
| Like | |
| Lower | |
| OctetLength | |
| Repeat | Negative argument for number of times to repeat causes exception |
| Replace | |
| Reverse | |
| StartsWith | |
| StringSpace | |
| StringTrim | |
| StringTrimBoth | |
| StringTrimLeft | |
| StringTrimRight | |
| Substring | |
| Translate | |
| Upper | |

## Date/Time Functions

| Expression | Notes |
| -------------- | ------------------------ |
| DatePart | Only `year` is supported |
| Extract | Only `year` is supported |
| Hour | |
| Minute | |
| Second | |
| TruncDate | |
| TruncTimestamp | |
| Year | |

## Math Expressions

| Expression | Notes |
| ---------- | ------------------------------------------------------------------- |
| Abs | |
| Acos | |
| Asin | |
| Atan | |
| Atan2 | |
| Ceil | |
| Cos | |
| Exp | |
| Floor | |
| Log | log(0) will produce `-Infinity` unlike Spark which returns `null` |
| Log2 | log2(0) will produce `-Infinity` unlike Spark which returns `null` |
| Log10 | log10(0) will produce `-Infinity` unlike Spark which returns `null` |
| Pow | |
| Round | |
| Signum | Signum does not differentiate between `0.0` and `-0.0` |
| Sin | |
| Sqrt | |
| Tan | |

## Hashing Functions

| Expression | Notes |
| ---------- | ----- |
| Md5 | |
| Hash | |
| Sha2 | |
| XxHash64 | |

## Boolean Expressions

| Expression | Notes |
| ---------- | ----- |
| And | |
| Or | |
| Not | |

## Bitwise Expressions

| Expression | Notes |
| -------------------- | ----- |
| ShiftLeft (`<<`) | |
| ShiftRight (`>>`) | |
| BitAnd (`&`) | |
| BitOr (`\|`) | |
| BitXor (`^`) | |
| BitwiseNot (`~`) | |
| BoolAnd (`bool_and`) | |
| BoolOr (`bool_or`) | |

## Aggregate Expressions

| Expression | Notes |
| ------------- | ----- |
| Avg | |
| BitAndAgg | |
| BitOrAgg | |
| BitXorAgg | |
| Corr | |
| Count | |
| CovPopulation | |
| CovSample | |
| First | |
| Last | |
| Max | |
| Min | |
| StddevPop | |
| StddevSamp | |
| Sum | |
| VariancePop | |
| VarianceSamp | |

## Other

| Expression | Notes |
| ----------------------- | ------------------------------------------------------------------------------- |
| Cast | See compatibility guide for list of supported cast expressions and known issues |
| BloomFilterMightContain | |
| ScalarSubquery | |
| Coalesce | |
| NormalizeNaNAndZero | |
27 changes: 16 additions & 11 deletions docs/source/user-guide/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,20 @@

# Supported Spark Operators

The following Spark operators are currently available:
The following Spark operators are currently replaced with native versions. Query stages that contain any operators
not supported by Comet will fall back to regular Spark execution.

- FileSourceScanExec/BatchScanExec for Parquet
- Projection
- Filter
- Sort
- Hash Aggregate
- Limit
- Sort-merge Join
- Hash Join
- Shuffle
- Expand
| Operator | Notes |
| -------------------------------------------- | ----- |
| FileSourceScanExec/BatchScanExec for Parquet | |
| Projection | |
| Filter | |
| Sort | |
| Hash Aggregate | |
| Limit | |
| Sort-merge Join | |
| Hash Join | |
| BroadcastHashJoinExec | |
| Shuffle | |
| Expand | |
| Union | |
6 changes: 3 additions & 3 deletions spark/src/main/scala/org/apache/comet/GenerateDocs.scala
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import scala.io.Source

import org.apache.spark.sql.catalyst.expressions.Cast

import org.apache.comet.expressions.{CometCast, Compatible, Incompatible}
import org.apache.comet.expressions.{CometCast, CometEvalMode, Compatible, Incompatible}

/**
* Utility for generating markdown documentation from the configs.
Expand Down Expand Up @@ -72,7 +72,7 @@ object GenerateDocs {
if (Cast.canCast(fromType, toType) && fromType != toType) {
val fromTypeName = fromType.typeName.replace("(10,2)", "")
val toTypeName = toType.typeName.replace("(10,2)", "")
CometCast.isSupported(fromType, toType, None, "LEGACY") match {
CometCast.isSupported(fromType, toType, None, CometEvalMode.LEGACY) match {
case Compatible(notes) =>
val notesStr = notes.getOrElse("").trim
w.write(s"| $fromTypeName | $toTypeName | $notesStr |\n".getBytes)
Expand All @@ -89,7 +89,7 @@ object GenerateDocs {
if (Cast.canCast(fromType, toType) && fromType != toType) {
val fromTypeName = fromType.typeName.replace("(10,2)", "")
val toTypeName = toType.typeName.replace("(10,2)", "")
CometCast.isSupported(fromType, toType, None, "LEGACY") match {
CometCast.isSupported(fromType, toType, None, CometEvalMode.LEGACY) match {
case Incompatible(notes) =>
val notesStr = notes.getOrElse("").trim
w.write(s"| $fromTypeName | $toTypeName | $notesStr |\n".getBytes)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ object CometCast {
fromType: DataType,
toType: DataType,
timeZoneId: Option[String],
evalMode: String): SupportLevel = {
evalMode: CometEvalMode.Value): SupportLevel = {

if (fromType == toType) {
return Compatible()
Expand Down Expand Up @@ -102,7 +102,7 @@ object CometCast {
private def canCastFromString(
toType: DataType,
timeZoneId: Option[String],
evalMode: String): SupportLevel = {
evalMode: CometEvalMode.Value): SupportLevel = {
toType match {
case DataTypes.BooleanType =>
Compatible()
Expand Down
Loading

0 comments on commit 1328114

Please sign in to comment.