Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[do-not-merge] Diff updated comet-parquet-exec feature branch against main #1182

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Dec 18, 2024

@andygrove suggested it might be helpful to see what the comet-parquet-exec branch with main merged into it (see #1183) looks like against upstream/main to see if the diff looks reasonable. Please do not merge!

mbutrovich and others added 30 commits November 8, 2024 13:54
add partial support for multiple parquet files
"filter with string" test now passes
* wip - CometNativeScan

* fix and make config internal
…e debug logging (apache#1080)

* update tests, remove some debug logging

* update tests, remove some debug logging

* update tests, remove some debug logging

* remove unused import
…che#1081)

* I think serde works. Gonna try removing the old stuff.

* Fixes after merging in upstream.

* Remove previous file_config logic. Clippy.

* Temporary assertion for testing.

* Remove old path proto value.

* Selectively generate projection vector.
…stead of FileScanRDD (apache#1088)

* DataSourceRDD handling (seems to be related to prefetching, so maybe not relevant for our ParquetExec).

* Refactor to reduce duplicate code.
…pache#1106)

* init

* more

* more

* fix clippy

* Use Spark and Arrow types for partition schema
* fix: Use RDD partition index (apache#1112)

* fix: Use RDD partition index

* fix

* fix

* fix

* fix style
…e#1138)

* WIP: (POC2) A Parquet reader that uses the arrow-rs Parquet reader directly

* Change default config

---------

Co-authored-by: Parth Chandra <[email protected]>
…rquet (apache#1075)

* implement basic native code for casting struct to struct

* add another test

* rustdoc

* add scala side

* code cleanup

* clippy

* clippy

* add scala test

* improve test

* simple struct case passes

* save progress

* copy schema adapter code from DataFusion

* more tests pass

* save progress

* remove debug println

* remove debug println
…e#1142)

* Serialize original data schema and required schema, generate projection vector on the Java side.

* Sending over more schema info like column names and nullability.

* Using the new stuff in the proto. About to take the old out.

* Remove old logic.

* remove errant print.

* Serialize original data schema and required schema, generate projection vector on the Java side.

* Sending over more schema info like column names and nullability.

* Using the new stuff in the proto. About to take the old out.

* Remove old logic.

* remove errant print.

* Remove commented print. format.

* Remove commented print. format.

* Fix projection_vector to include partition_schema cols correctly.

* Rename variable.
parthchandra and others added 15 commits December 5, 2024 15:37
* support more timestamp conversions

* improve error handling

* rename projected_table_schema to required_schema

* Save

* save

* save

* code cleanup
…implementation (apache#1170)

* fix: CometScanExec was created for unsupported cases if only COMET_NATIVE_SCAN is enabled

* fix: Another try to fix '  test("Comet native metrics: BroadcastHashJoin")

* fix: some tests are valid only when full native scan is enabled

* Merge pull request #1 from andygrove/fix-tests-spark-cast-options
…or use in iceberg reads (apache#1174)

* wip. Use DF's ParquetExec for Iceberg API

* wip - await??

* wip

* wip -

* fix shading issue

* fix shading issue

* fixes

* refactor to remove arrow based reader

* rename config

* Fix config defaults

---------

Co-authored-by: Andy Grove <[email protected]>
# Conflicts:
#	native/Cargo.lock
#	native/Cargo.toml
#	native/core/src/execution/jni_api.rs
#	native/core/src/execution/planner.rs
#	native/core/src/execution/schema_adapter.rs
#	native/spark-expr/src/cast.rs
#	native/spark-expr/src/lib.rs
#	native/spark-expr/src/test_common/mod.rs
#	native/spark-expr/src/utils.rs
#	spark/src/main/scala/org/apache/comet/CometExecIterator.scala
#	spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala
#	spark/src/main/scala/org/apache/comet/Native.scala
#	spark/src/main/scala/org/apache/spark/sql/comet/operators.scala
#	spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
#	spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants