Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Only maps FIXED_LEN_BYTE_ARRAY to String for uuid type #238

Merged
merged 2 commits into from
Apr 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion common/src/main/java/org/apache/comet/parquet/TypeUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,9 @@ && isUnsignedIntTypeMatched(logicalTypeAnnotation, 64)) {
|| canReadAsBinaryDecimal(descriptor, sparkType)
|| sparkType == DataTypes.BinaryType
// for uuid, since iceberg maps uuid to StringType
|| sparkType == DataTypes.StringType) {
|| sparkType == DataTypes.StringType
&& logicalTypeAnnotation
instanceof LogicalTypeAnnotation.UUIDLogicalTypeAnnotation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we test this?

One another question is, this is parquet uuid logical type, why it is Iceberg specified?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we test this?

I have made change on my local, and used the local build to test UUID in iceberg table.

this is parquet uuid logical type, why it is Iceberg specified?

I don't think Spark support UUID type. If the code comes to this line, the type has to be Iceberg UUID. If we want to be absolutely sure, we can add a flag when creating ColumnReader in Iceberg.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, one related question: is Iceberg reader supported in Comet yet?

It seems like that Comet doesn't support Iceberg reader yet? Once it's added, we can test this then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this can only be tested on my local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya If the SparkType is StringType and LogicalTypeAnnotation is UUID, then this must be iceberg UUID column, because only iceberg maps UUID to Spark StringType. I feel the change is safe. Or we can add an extra parameter in getColumnReader to indicate whether the ColumnReader is an Iceberg ColumnReader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

return;
}
break;
Expand Down
Loading