Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Able to parse name-mapping into a recusive structure. #723

Open
Fokko opened this issue Nov 27, 2024 · 2 comments · May be fixed by #740
Open

Able to parse name-mapping into a recusive structure. #723

Fokko opened this issue Nov 27, 2024 · 2 comments · May be fixed by #740
Assignees
Labels
good first issue Good for newcomers

Comments

@Fokko
Copy link
Contributor

Fokko commented Nov 27, 2024

Name mapping is used when the files in the table don't have field-IDs encoded in the Parquet files. For example, when adding files through add_files in the case of a table migration from Hive, the Parquet files don't have field-IDs in them. In this case we want to make use of name-mapping: https://iceberg.apache.org/spec/#name-mapping-serialization This is a JSON blob that's stored alongside the table in a table property.

This issue is solely on the deserialization of the JSON blob into a memory structure. Tests can be found here: https://github.com/apache/iceberg-python/blob/main/tests/table/test_name_mapping.py

Future tip: It is best to store this in a recursive field so it can be traversed using a VisitorWithParent where both a Schema and NameMapping can be traversed at once. This is important because we cannot flatten the name-mapping because of potential dots in the field name, and this disallows us to split between fields and subfields. This is done in PyIceberg here: apache/iceberg-python#1014

@Fokko Fokko added the good first issue Good for newcomers label Nov 27, 2024
@Fokko Fokko mentioned this issue Nov 27, 2024
28 tasks
@barronw
Copy link
Contributor

barronw commented Nov 28, 2024

Can I pick this up?

@c-thiel
Copy link
Collaborator

c-thiel commented Nov 28, 2024

@barronw gladly! Assigned the issue to you.
If there are any questions, just post them here or contact us on Slack :)

@barronw barronw linked a pull request Nov 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants