-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [comet-parquet-exec] Schema adapter fixes #1139
feat: [comet-parquet-exec] Schema adapter fixes #1139
Conversation
@@ -146,6 +162,25 @@ fn timestamp_ntz_to_timestamp( | |||
}; | |||
Ok(Arc::new(array_with_tz)) | |||
} | |||
DataType::Timestamp(TimeUnit::Millisecond, None) => { | |||
let array = as_primitive_array::<TimestampMillisecondType>(&array); | |||
let tz: Tz = tz.parse()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this called frequently (per row)? timezone parse is somewhat expensive (and does not change for a session).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is once per array, but I think the parsing could happen once during planning rather than per batch/array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, we can defer this for the moment.
@@ -161,7 +163,7 @@ impl SchemaAdapter for CometSchemaAdapter { | |||
pub struct SchemaMapping { | |||
/// The schema of the table. This is the expected schema after conversion | |||
/// and it should match the schema of the query result. | |||
projected_table_schema: SchemaRef, | |||
required_schema: SchemaRef, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is still draft, but we can commit this whenever ready.
native/spark-expr/src/cast.rs
Outdated
@@ -723,7 +728,9 @@ fn cast_array( | |||
timezone, | |||
allow_incompat, | |||
)?), | |||
_ if is_datafusion_spark_compatible(from_type, to_type, allow_incompat) => { | |||
_ if ugly_hack_for_poc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the cases (that we know of) where this gets invoked? (If we know we can replace this flag with an explicit check for those cases, perhaps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we need to implement specific logic for adapting parquet schemas, rather than re-using our Spark-compatible cast. There is likely some overlap, so we can refactor the common code out. For example, regular spark casts do not need to support unsigned integers, but we need this when adapting Parquet schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I found that the description is empty. So I'm not sure what this tries to fix. Could you add some more details there? Thanks.
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?