Skip to content

Commit

Permalink
docs: Improve user documentation for supported operators and expressi…
Browse files Browse the repository at this point in the history
…ons (#520)

* Improve documentation about supported operators and expressions

* Improve documentation about supported operators and expressions

* more notes

* Add more supported expressions

* rename protobuf Negative to UnaryMinus for consistency

* format

* remove duplicate ASF header

* SMJ not disabled by default

* Update docs/source/user-guide/operators.md

Co-authored-by: Liang-Chi Hsieh <[email protected]>

* Update docs/source/user-guide/operators.md

Co-authored-by: Liang-Chi Hsieh <[email protected]>

* remove RLike

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>
  • Loading branch information
andygrove and viirya authored Jun 7, 2024
1 parent fd596ed commit f75aeef
Show file tree
Hide file tree
Showing 5 changed files with 192 additions and 112 deletions.
2 changes: 1 addition & 1 deletion core/src/execution/datafusion/planner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -566,7 +566,7 @@ impl PhysicalPlanner {
let child = self.create_expr(expr.child.as_ref().unwrap(), input_schema)?;
Ok(Arc::new(NotExpr::new(child)))
}
ExprStruct::Negative(expr) => {
ExprStruct::UnaryMinus(expr) => {
let child: Arc<dyn PhysicalExpr> =
self.create_expr(expr.child.as_ref().unwrap(), input_schema.clone())?;
let result = negative::create_negate_expr(child, expr.fail_on_error);
Expand Down
4 changes: 2 additions & 2 deletions core/src/execution/proto/expr.proto
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ message Expr {
CaseWhen caseWhen = 38;
In in = 39;
Not not = 40;
Negative negative = 41;
UnaryMinus unary_minus = 41;
BitwiseShiftRight bitwiseShiftRight = 42;
BitwiseShiftLeft bitwiseShiftLeft = 43;
IfExpr if = 44;
Expand Down Expand Up @@ -452,7 +452,7 @@ message Not {
Expr child = 1;
}

message Negative {
message UnaryMinus {
Expr child = 1;
bool fail_on_error = 2;
}
Expand Down
267 changes: 171 additions & 96 deletions docs/source/user-guide/expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,99 +19,174 @@

# Supported Spark Expressions

The following Spark expressions are currently available:

- Literals
- Arithmetic Operators
- UnaryMinus
- Add/Minus/Multiply/Divide/Remainder
- Conditional functions
- Case When
- If
- Cast
- Coalesce
- BloomFilterMightContain
- Boolean functions
- And
- Or
- Not
- EqualTo
- EqualNullSafe
- GreaterThan
- GreaterThanOrEqual
- LessThan
- LessThanOrEqual
- IsNull
- IsNotNull
- In
- String functions
- Substring
- Coalesce
- StringSpace
- Like
- Contains
- Startswith
- Endswith
- Ascii
- Bit_length
- Octet_length
- Upper
- Lower
- Chr
- Initcap
- Trim/Btrim/Ltrim/Rtrim
- Concat_ws
- Repeat
- Length
- Reverse
- Instr
- Replace
- Translate
- Bitwise functions
- Shiftright/Shiftleft
- Date/Time functions
- Year/Hour/Minute/Second
- Hash functions
- Md5
- Sha2
- Hash
- Xxhash64
- Math functions
- Abs
- Acos
- Asin
- Atan
- Atan2
- Cos
- Exp
- Ln
- Log10
- Log2
- Pow
- Round
- Signum
- Sin
- Sqrt
- Tan
- Ceil
- Floor
- Aggregate functions
- Count
- Sum
- Max
- Min
- Avg
- First
- Last
- BitAnd
- BitOr
- BitXor
- BoolAnd
- BoolOr
- CovPopulation
- CovSample
- VariancePop
- VarianceSamp
- StddevPop
- StddevSamp
- Corr
The following Spark expressions are currently available. Any known compatibility issues are noted in the following tables.

## Literal Values

| Expression | Notes |
| -------------------------------------- | ----- |
| Literal values of supported data types | |

## Unary Arithmetic

| Expression | Notes |
| ---------------- | ----- |
| UnaryMinus (`-`) | |

## Binary Arithmeticx

| Expression | Notes |
| --------------- | --------------------------------------------------- |
| Add (`+`) | |
| Subtract (`-`) | |
| Multiply (`*`) | |
| Divide (`/`) | |
| Remainder (`%`) | Comet produces `NaN` instead of `NULL` for `% -0.0` |

## Conditional Expressions

| Expression | Notes |
| ---------- | ----- |
| CaseWhen | |
| If | |

## Comparison

| Expression | Notes |
| ------------------------- | ----- |
| EqualTo (`=`) | |
| EqualNullSafe (`<=>`) | |
| GreaterThan (`>`) | |
| GreaterThanOrEqual (`>=`) | |
| LessThan (`<`) | |
| LessThanOrEqual (`<=`) | |
| IsNull (`IS NULL`) | |
| IsNotNull (`IS NOT NULL`) | |
| In (`IN`) | |

## String Functions

| Expression | Notes |
| --------------- | ----------------------------------------------------------------------------------------------------------- |
| Ascii | |
| BitLength | |
| Chr | |
| ConcatWs | |
| Contains | |
| EndsWith | |
| InitCap | |
| Instr | |
| Length | |
| Like | |
| Lower | |
| OctetLength | |
| Repeat | Negative argument for number of times to repeat causes exception |
| Replace | |
| Reverse | |
| StartsWith | |
| StringSpace | |
| StringTrim | |
| StringTrimBoth | |
| StringTrimLeft | |
| StringTrimRight | |
| Substring | |
| Translate | |
| Upper | |

## Date/Time Functions

| Expression | Notes |
| -------------- | ------------------------ |
| DatePart | Only `year` is supported |
| Extract | Only `year` is supported |
| Hour | |
| Minute | |
| Second | |
| TruncDate | |
| TruncTimestamp | |
| Year | |

## Math Expressions

| Expression | Notes |
| ---------- | ------------------------------------------------------------------- |
| Abs | |
| Acos | |
| Asin | |
| Atan | |
| Atan2 | |
| Ceil | |
| Cos | |
| Exp | |
| Floor | |
| Log | log(0) will produce `-Infinity` unlike Spark which returns `null` |
| Log2 | log2(0) will produce `-Infinity` unlike Spark which returns `null` |
| Log10 | log10(0) will produce `-Infinity` unlike Spark which returns `null` |
| Pow | |
| Round | |
| Signum | Signum does not differentiate between `0.0` and `-0.0` |
| Sin | |
| Sqrt | |
| Tan | |

## Hashing Functions

| Expression | Notes |
| ---------- | ----- |
| Md5 | |
| Hash | |
| Sha2 | |
| XxHash64 | |

## Boolean Expressions

| Expression | Notes |
| ---------- | ----- |
| And | |
| Or | |
| Not | |

## Bitwise Expressions

| Expression | Notes |
| -------------------- | ----- |
| ShiftLeft (`<<`) | |
| ShiftRight (`>>`) | |
| BitAnd (`&`) | |
| BitOr (`\|`) | |
| BitXor (`^`) | |
| BitwiseNot (`~`) | |
| BoolAnd (`bool_and`) | |
| BoolOr (`bool_or`) | |

## Aggregate Expressions

| Expression | Notes |
| ------------- | ----- |
| Avg | |
| BitAndAgg | |
| BitOrAgg | |
| BitXorAgg | |
| Corr | |
| Count | |
| CovPopulation | |
| CovSample | |
| First | |
| Last | |
| Max | |
| Min | |
| StddevPop | |
| StddevSamp | |
| Sum | |
| VariancePop | |
| VarianceSamp | |

## Other

| Expression | Notes |
| ----------------------- | ------------------------------------------------------------------------------- |
| Cast | See compatibility guide for list of supported cast expressions and known issues |
| BloomFilterMightContain | |
| ScalarSubquery | |
| Coalesce | |
| NormalizeNaNAndZero | |
27 changes: 16 additions & 11 deletions docs/source/user-guide/operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,20 @@

# Supported Spark Operators

The following Spark operators are currently available:
The following Spark operators are currently replaced with native versions. Query stages that contain any operators
not supported by Comet will fall back to regular Spark execution.

- FileSourceScanExec/BatchScanExec for Parquet
- Projection
- Filter
- Sort
- Hash Aggregate
- Limit
- Sort-merge Join
- Hash Join
- Shuffle
- Expand
| Operator | Notes |
| -------------------------------------------- | ----- |
| FileSourceScanExec/BatchScanExec for Parquet | |
| Projection | |
| Filter | |
| Sort | |
| Hash Aggregate | |
| Limit | |
| Sort-merge Join | |
| Hash Join | |
| BroadcastHashJoinExec | |
| Shuffle | |
| Expand | |
| Union | |
Original file line number Diff line number Diff line change
Expand Up @@ -1987,13 +1987,13 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim
case UnaryMinus(child, failOnError) =>
val childExpr = exprToProtoInternal(child, inputs)
if (childExpr.isDefined) {
val builder = ExprOuterClass.Negative.newBuilder()
val builder = ExprOuterClass.UnaryMinus.newBuilder()
builder.setChild(childExpr.get)
builder.setFailOnError(failOnError)
Some(
ExprOuterClass.Expr
.newBuilder()
.setNegative(builder)
.setUnaryMinus(builder)
.build())
} else {
withInfo(expr, child)
Expand Down

0 comments on commit f75aeef

Please sign in to comment.