Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support reading partitioned Delta table. #14084

Merged
merged 21 commits into from
Dec 21, 2023

Conversation

youngsofun
Copy link
Member

@youngsofun youngsofun commented Dec 19, 2023

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

In a delta table, partition columns are not stored in parquet file.
so it needs a few efforts to make pushdown work:

  • context:
    • Table store partition column names in meta.engine_options.
    • Each partition carries all partition column values in the same order.
    • With this order, we can get needed info with a PartitionIndex.
  • pushdown:
    • projections (mask): partition columns are excluded when read parquet file and inserted at last. (use FieldIndex->PartitionIndex)
    • filter pass to parquet reader: all partition columns are appended to the filter input columns. (ID=PartitionIndex+ num_of_non_partition_input_columns)
    • pruner: ColumnRef of partition columns in filter expr are replace with const scalars (by name).

Type of partition columns can only be simple primitive types.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Dec 19, 2023
@youngsofun youngsofun requested review from RinChanNOWWW and sundy-li and removed request for RinChanNOWWW December 19, 2023 17:10
@BohuTANG BohuTANG merged commit 12a9186 into databendlabs:main Dec 21, 2023
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants