You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Name mapping is used when the files in the table don't have field-IDs encoded in the Parquet files. For example, when adding files through add_files in the case of a table migration from Hive, the Parquet files don't have field-IDs in them. In this case we want to make use of name-mapping: https://iceberg.apache.org/spec/#name-mapping-serialization This is a JSON blob that's stored alongside the table in a table property.
Future tip: It is best to store this in a recursive field so it can be traversed using a VisitorWithParent where both a Schema and NameMapping can be traversed at once. This is important because we cannot flatten the name-mapping because of potential dots in the field name, and this disallows us to split between fields and subfields. This is done in PyIceberg here: apache/iceberg-python#1014
The text was updated successfully, but these errors were encountered:
Name mapping is used when the files in the table don't have field-IDs encoded in the Parquet files. For example, when adding files through
add_files
in the case of a table migration from Hive, the Parquet files don't have field-IDs in them. In this case we want to make use of name-mapping: https://iceberg.apache.org/spec/#name-mapping-serialization This is a JSON blob that's stored alongside the table in a table property.This issue is solely on the deserialization of the JSON blob into a memory structure. Tests can be found here: https://github.com/apache/iceberg-python/blob/main/tests/table/test_name_mapping.py
Future tip: It is best to store this in a recursive field so it can be traversed using a
VisitorWithParent
where both aSchema
andNameMapping
can be traversed at once. This is important because we cannot flatten the name-mapping because of potential dots in the field name, and this disallows us to split between fields and subfields. This is done in PyIceberg here: apache/iceberg-python#1014The text was updated successfully, but these errors were encountered: