-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Only maps FIXED_LEN_BYTE_ARRAY to String for uuid type #238
Conversation
|| sparkType == DataTypes.StringType) { | ||
|| sparkType == DataTypes.StringType | ||
&& descriptor.getPrimitiveType().getLogicalTypeAnnotation() | ||
instanceof LogicalTypeAnnotation.UUIDLogicalTypeAnnotation) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we test this?
One another question is, this is parquet uuid logical type, why it is Iceberg specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we test this?
I have made change on my local, and used the local build to test UUID in iceberg table.
this is parquet uuid logical type, why it is Iceberg specified?
I don't think Spark support UUID type. If the code comes to this line, the type has to be Iceberg UUID. If we want to be absolutely sure, we can add a flag when creating ColumnReader in Iceberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, one related question: is Iceberg reader supported in Comet yet?
It seems like that Comet doesn't support Iceberg reader yet? Once it's added, we can test this then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this can only be tested on my local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya If the SparkType
is StringType
and LogicalTypeAnnotation
is UUID
, then this must be iceberg UUID column, because only iceberg maps UUID to Spark StringType
. I feel the change is safe. Or we can add an extra parameter in getColumnReader to indicate whether the ColumnReader is an Iceberg ColumnReader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #238 +/- ##
=========================================
Coverage 33.48% 33.48%
Complexity 776 776
=========================================
Files 108 108
Lines 37178 37178
Branches 8146 8146
=========================================
Hits 12448 12448
Misses 22107 22107
Partials 2623 2623 ☔ View full report in Codecov by Sentry. |
Merged. Thanks. |
Thanks every one for reviewing |
* Only maps FIXED_LEN_BYTE_ARRAY to String for uuid type * remove redundant code --------- Co-authored-by: Huaxin Gao <[email protected]>
Which issue does this PR close?
Closes #.
Rationale for this change
UUID is not a SQL type in Spark, but is supported by Iceberg and is mapped to UTF8String. In order to support UUID, we have mapped FIXED_LEN_BYTE_ARRAY to String, but we can only do this if the FIXED_LEN_BYTE_ARRAY is UUID. This PR adds the check so we only map FIXED_LEN_BYTE_ARRAY to String for UUID type.
What changes are included in this PR?
How are these changes tested?