-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-46934][SQL][FOLLOWUP] Read/write roundtrip for struct type with special characters with HMS - a backward compatible approach #48986
base: master
Are you sure you want to change the base?
Conversation
@@ -1130,10 +1133,6 @@ private[hive] object HiveClientImpl extends Logging { | |||
Option(hc.getComment).map(field.withComment).getOrElse(field) | |||
} | |||
|
|||
private def verifyColumnDataType(schema: StructType): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a misleading step for the write path, the schema
here is both produced and verified by Spark itself but reports a CANNOT_RECOGNIZE_HIVE_TYPE error to us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change by quoting, this check is always positive
@@ -433,7 +433,8 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru | |||
stringConcat.append("struct<") | |||
var i = 0 | |||
while (i < len) { | |||
stringConcat.append(s"${fields(i).name}:${fields(i).dataType.catalogString}") | |||
val name = QuotingUtils.quoteIfNeeded(fields(i).name) | |||
stringConcat.append(s"$name:${fields(i).dataType.catalogString}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @zsxwing, this seems also affect output schema of queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @cloud-fan. If this is the right option to go with, shall we create a variant for the catalogString
method to minimize the impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the previous discussion thread can be visited at #45039 (comment)
What changes were proposed in this pull request?
A backward-compatible approach for #45039 to make older versions of spark properly read struct-typed columns created by spark 4.x or later with special characters.
Compared with #45039, only the datasource tables are supported now, as we have a special way to store hive incompatible schema to the table properties. This is a safe removal because we don't have any release to support that.
Why are the changes needed?
backward-compatibility improvement
Does this PR introduce any user-facing change?
Users can store/read struct-typed columns with special characters.
How was this patch tested?
tests provided by SPARK-22431
tests provided by the previous PR towards SPARK-46934
manually backward compatibility test
Was this patch authored or co-authored using generative AI tooling?
No