feat: [comet-parquet-exec] Schema adapter fixes #1139

andygrove · 2024-12-04T21:14:30Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

parthchandra · 2024-12-04T22:30:04Z

native/spark-expr/src/utils.rs

@@ -146,6 +162,25 @@ fn timestamp_ntz_to_timestamp(
            };
            Ok(Arc::new(array_with_tz))
        }
+        DataType::Timestamp(TimeUnit::Millisecond, None) => {
+            let array = as_primitive_array::<TimestampMillisecondType>(&array);
+            let tz: Tz = tz.parse()?;


Is this called frequently (per row)? timezone parse is somewhat expensive (and does not change for a session).

This is once per array, but I think the parsing could happen once during planning rather than per batch/array.

Makes sense, we can defer this for the moment.

parthchandra · 2024-12-04T22:30:29Z

native/core/src/execution/datafusion/schema_adapter.rs

@@ -161,7 +163,7 @@ impl SchemaAdapter for CometSchemaAdapter {
 pub struct SchemaMapping {
    /// The schema of the table. This is the expected schema after conversion
    /// and it should match the schema of the query result.
-    projected_table_schema: SchemaRef,
+    required_schema: SchemaRef,


…adapter-fixes

parthchandra

I know this is still draft, but we can commit this whenever ready.

parthchandra · 2024-12-05T22:16:12Z

native/spark-expr/src/cast.rs

@@ -723,7 +728,9 @@ fn cast_array(
            timezone,
            allow_incompat,
        )?),
-        _ if is_datafusion_spark_compatible(from_type, to_type, allow_incompat) => {
+        _ if ugly_hack_for_poc


What are the cases (that we know of) where this gets invoked? (If we know we can replace this flag with an explicit check for those cases, perhaps?)

I believe that we need to implement specific logic for adapting parquet schemas, rather than re-using our Spark-compatible cast. There is likely some overlap, so we can refactor the common code out. For example, regular spark casts do not need to support unsigned integers, but we need this when adapting Parquet schemas.

viirya

Hmm, I found that the description is empty. So I'm not sure what this tries to fix. Could you add some more details there? Thanks.

andygrove added 3 commits December 4, 2024 14:01

support more timestamp conversions

bf6b4d4

improve error handling

d4d71bc

rename projected_table_schema to required_schema

b6036f2

parthchandra reviewed Dec 4, 2024

View reviewed changes

andygrove added 4 commits December 5, 2024 10:21

Merge remote-tracking branch 'apache/comet-parquet-exec' into schema-…

0b43b23

…adapter-fixes

Save

0602e24

save

74add9c

save

2021406

parthchandra approved these changes Dec 5, 2024

View reviewed changes

andygrove added 2 commits December 6, 2024 08:02

code cleanup

42d9f93

upmerge

d220ae3

andygrove marked this pull request as ready for review December 6, 2024 15:10

andygrove merged commit bd797f5 into apache:comet-parquet-exec Dec 6, 2024
10 of 70 checks passed

andygrove deleted the schema-adapter-fixes branch December 6, 2024 15:11

viirya reviewed Dec 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [comet-parquet-exec] Schema adapter fixes #1139

feat: [comet-parquet-exec] Schema adapter fixes #1139

andygrove commented Dec 4, 2024

parthchandra Dec 4, 2024

andygrove Dec 5, 2024

parthchandra Dec 5, 2024

parthchandra Dec 4, 2024

parthchandra left a comment

parthchandra Dec 5, 2024

andygrove Dec 5, 2024

viirya left a comment

feat: [comet-parquet-exec] Schema adapter fixes #1139

feat: [comet-parquet-exec] Schema adapter fixes #1139

Conversation

andygrove commented Dec 4, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

parthchandra Dec 4, 2024

Choose a reason for hiding this comment

andygrove Dec 5, 2024

Choose a reason for hiding this comment

parthchandra Dec 5, 2024

Choose a reason for hiding this comment

parthchandra Dec 4, 2024

Choose a reason for hiding this comment

parthchandra left a comment

Choose a reason for hiding this comment

parthchandra Dec 5, 2024

Choose a reason for hiding this comment

andygrove Dec 5, 2024

Choose a reason for hiding this comment

viirya left a comment

Choose a reason for hiding this comment