Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor: refactor to move decodeBatches to broadcast exchange code as private function #1195

Merged
merged 1 commit into from
Dec 22, 2024

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Dec 22, 2024

Which issue does this PR close?

N/A

Rationale for this change

This is a small refactor extracted from #1192.

What changes are included in this PR?

  • Remove function executeColumnarCollectIterator and associated test because it isn't used anywhere else.
  • Move decodeBatches from CometExec to a private function in CometBroadcastExchangeExec.scala since that is the only place that it is needed.
  • Add some comments

How are these changes tested?

Existing tests

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 34.75%. Comparing base (ea6d205) to head (65f46a1).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...e/spark/sql/comet/CometBroadcastExchangeExec.scala 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1195       +/-   ##
=============================================
- Coverage     55.28%   34.75%   -20.53%     
- Complexity      868      958       +90     
=============================================
  Files           112      115        +3     
  Lines         10969    43623    +32654     
  Branches       2116     9517     +7401     
=============================================
+ Hits           6064    15161     +9097     
- Misses         3826    25514    +21688     
- Partials       1079     2948     +1869     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andygrove andygrove changed the title minor: refactor to move decodeBatches to broacast exchange code as private function minor: refactor to move decodeBatches to broadcast exchange code as private function Dec 22, 2024
@andygrove andygrove merged commit 639fa2f into apache:main Dec 22, 2024
77 checks passed
@andygrove andygrove deleted the minor-refactor-decodeBatches branch December 22, 2024 19:25
dharanad pushed a commit to dharanad/datafusion-comet that referenced this pull request Jan 1, 2025
andygrove added a commit that referenced this pull request Jan 2, 2025
* feat: add support for array_contains expression

* test: add unit test for array_contains function

* Removes unnecessary case expression for handling null values

* chore: Move more expressions from core crate to spark-expr crate (#1152)

* move aggregate expressions to spark-expr crate

* move more expressions

* move benchmark

* normalize_nan

* bitwise not

* comet scalar funcs

* update bench imports

* remove dead code (#1155)

* fix: Spark 4.0-preview1 SPARK-47120 (#1156)

## Which issue does this PR close?

Part of #372 and #551

## Rationale for this change

To be ready for Spark 4.0

## What changes are included in this PR?

This PR fixes the new test SPARK-47120 added in Spark 4.0

## How are these changes tested?

tests enabled

* chore: Move string kernels and expressions to spark-expr crate (#1164)

* Move string kernels and expressions to spark-expr crate

* remove unused hash kernel

* remove unused dependencies

* chore: Move remaining expressions to spark-expr crate + some minor refactoring (#1165)

* move CheckOverflow to spark-expr crate

* move NegativeExpr to spark-expr crate

* move UnboundColumn to spark-expr crate

* move ExpandExec from execution::datafusion::operators to execution::operators

* refactoring to remove datafusion subpackage

* update imports in benches

* fix

* fix

* chore: Add ignored tests for reading complex types from Parquet (#1167)

* Add ignored tests for reading structs from Parquet

* add basic map test

* add tests for Map and Array

* feat: Add Spark-compatible implementation of SchemaAdapterFactory (#1169)

* Add Spark-compatible SchemaAdapterFactory implementation

* remove prototype code

* fix

* refactor

* implement more cast logic

* implement more cast logic

* add basic test

* improve test

* cleanup

* fmt

* add support for casting unsigned int to signed int

* clippy

* address feedback

* fix test

* fix: Document enabling comet explain plan usage in Spark (4.0) (#1176)

* test: enabling Spark tests with offHeap requirement (#1177)

## Which issue does this PR close?

## Rationale for this change

After #1062 We have not running Spark tests for native execution

## What changes are included in this PR?

Removed the off heap requirement for testing

## How are these changes tested?

Bringing back Spark tests for native execution

* feat: Improve shuffle metrics (second attempt) (#1175)

* improve shuffle metrics

* docs

* more metrics

* refactor

* address feedback

* fix: stddev_pop should not directly return 0.0 when count is 1.0 (#1184)

* add test

* fix

* fix

* fix

* feat: Make native shuffle compression configurable and respect `spark.shuffle.compress` (#1185)

* Make shuffle compression codec and level configurable

* remove lz4 references

* docs

* update comment

* clippy

* fix benches

* clippy

* clippy

* disable test for miri

* remove lz4 reference from proto

* minor: move shuffle classes from common to spark (#1193)

* minor: refactor decodeBatches to make private in broadcast exchange (#1195)

* minor: refactor prepare_output so that it does not require an ExecutionContext (#1194)

* fix: fix missing explanation for then branch in case when (#1200)

* minor: remove unused source files (#1202)

* chore: Upgrade to DataFusion 44.0.0-rc2 (#1154)

* move aggregate expressions to spark-expr crate

* move more expressions

* move benchmark

* normalize_nan

* bitwise not

* comet scalar funcs

* update bench imports

* save

* save

* save

* remove unused imports

* clippy

* implement more hashers

* implement Hash and PartialEq

* implement Hash and PartialEq

* implement Hash and PartialEq

* benches

* fix ScalarUDFImpl.return_type failure

* exclude test from miri

* ignore correct test

* ignore another test

* remove miri checks

* use return_type_from_exprs

* Revert "use return_type_from_exprs"

This reverts commit febc1f1.

* use DF main branch

* hacky workaround for regression in ScalarUDFImpl.return_type

* fix repo url

* pin to revision

* bump to latest rev

* bump to latest DF rev

* bump DF to rev 9f530dd

* add Cargo.lock

* bump DF version

* no default features

* Revert "remove miri checks"

This reverts commit 4638fe3.

* Update pin to DataFusion e99e02b9b9093ceb0c13a2dd32a2a89beba47930

* update pin

* Update Cargo.toml

Bump to 44.0.0-rc2

* update cargo lock

* revert miri change

---------

Co-authored-by: Andrew Lamb <[email protected]>

* update UT

Signed-off-by: Dharan Aditya <[email protected]>

* fix typo in UT

Signed-off-by: Dharan Aditya <[email protected]>

---------

Signed-off-by: Dharan Aditya <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: KAZUYUKI TANIMURA <[email protected]>
Co-authored-by: Parth Chandra <[email protected]>
Co-authored-by: Liang-Chi Hsieh <[email protected]>
Co-authored-by: Raz Luvaton <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants