Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [comet-parquet-exec] Schema adapter fixes #1139

Merged
merged 9 commits into from
Dec 6, 2024

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@@ -146,6 +162,25 @@ fn timestamp_ntz_to_timestamp(
};
Ok(Arc::new(array_with_tz))
}
DataType::Timestamp(TimeUnit::Millisecond, None) => {
let array = as_primitive_array::<TimestampMillisecondType>(&array);
let tz: Tz = tz.parse()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called frequently (per row)? timezone parse is somewhat expensive (and does not change for a session).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is once per array, but I think the parsing could happen once during planning rather than per batch/array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, we can defer this for the moment.

@@ -161,7 +163,7 @@ impl SchemaAdapter for CometSchemaAdapter {
pub struct SchemaMapping {
/// The schema of the table. This is the expected schema after conversion
/// and it should match the schema of the query result.
projected_table_schema: SchemaRef,
required_schema: SchemaRef,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 :)

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still draft, but we can commit this whenever ready.

@@ -723,7 +728,9 @@ fn cast_array(
timezone,
allow_incompat,
)?),
_ if is_datafusion_spark_compatible(from_type, to_type, allow_incompat) => {
_ if ugly_hack_for_poc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the cases (that we know of) where this gets invoked? (If we know we can replace this flag with an explicit check for those cases, perhaps?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we need to implement specific logic for adapting parquet schemas, rather than re-using our Spark-compatible cast. There is likely some overlap, so we can refactor the common code out. For example, regular spark casts do not need to support unsigned integers, but we need this when adapting Parquet schemas.

@andygrove andygrove marked this pull request as ready for review December 6, 2024 15:10
@andygrove andygrove merged commit bd797f5 into apache:comet-parquet-exec Dec 6, 2024
10 of 70 checks passed
@andygrove andygrove deleted the schema-adapter-fixes branch December 6, 2024 15:11
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I found that the description is empty. So I'm not sure what this tries to fix. Could you add some more details there? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants