Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: integrate fuse table block pruning into pipeline #16841

Merged
merged 27 commits into from
Dec 10, 2024

Conversation

dqhl76
Copy link
Collaborator

@dqhl76 dqhl76 commented Nov 14, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

part of #16626

Split the block prune part and put into processor

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Nov 14, 2024
Copy link

what-the-diff bot commented Nov 14, 2024

PR Summary

  • Development of a Pruning Pipeline: A new 'pruning pipeline' system has been added to help manage data more efficiently. This system includes the option for users to decide if they want to enable or disable it, and a way to read the setting to understand if it's enabled or not.

  • Refining of Data Reading Method: The method to read data has been adapted to better handle initialization of segments and includes the new pruning logic.

  • Enhancements of Pruning Mechanism: The comprehensive pruning mechanism now also includes caching strategies and more sophisticated ways of handling segments.

  • Additional Pruning Operations: More pruning operations have been added to allow asynchronous segment pruning, improving execution speed.

  • Boosted Synchronization Capabilities: A new external library (or "crate") named parking_lot has been included to enhance data synchronization capabilities.

  • Asynchronous and Synchronous Pruning Features: The PR includes new asynchronous and synchronous block pruning features, improving execution speed and flexibility.

  • Metadata Management Techniques: Techniques for handling and representing block metadata have been developed. This includes probability-based sampling of metadata, extraction of metadata from segments, and creation of metadata pairs to represent pruning results.

Note: The PR also contains several newly-added files, and overall, the new developments contribute to more efficient data handling and improved performance. Several methods were made public to improve accessibility, and minor codes were refactored to enhance clarity and modularity.

@dqhl76 dqhl76 added the ci-benchmark Benchmark: run all test label Nov 19, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16841-291921d-1731997233

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dqhl76 dqhl76 changed the title refactor: make block pruning into pipeline refactor: integrate fuse table block pruning into pipeline Nov 19, 2024
@dqhl76 dqhl76 removed the ci-benchmark Benchmark: run all test label Nov 19, 2024
@dqhl76 dqhl76 added the ci-benchmark Benchmark: run all test label Nov 26, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16841-85c420d-1732594092

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dqhl76 dqhl76 marked this pull request as ready for review November 26, 2024 04:50
@dqhl76 dqhl76 requested a review from zhang2014 November 26, 2024 04:50
@dqhl76 dqhl76 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Nov 28, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16841-6b2a679-1732769364

note: this image tag is only available for internal use,
please check the internal doc for more details.

@zhang2014 zhang2014 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Nov 29, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16841-2255adf-1732847244

note: this image tag is only available for internal use,
please check the internal doc for more details.

@zhang2014 zhang2014 added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Dec 8, 2024
Copy link
Contributor

github-actions bot commented Dec 8, 2024

Docker Image for PR

  • tag: pr-16841-ddca9d1-1733640327

note: this image tag is only available for internal use,
please check the internal doc for more details.

@zhang2014 zhang2014 enabled auto-merge December 9, 2024 08:58
@sundy-li sundy-li removed the ci-benchmark Benchmark: run all test label Dec 9, 2024
@zhang2014 zhang2014 added this pull request to the merge queue Dec 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2024
@BohuTANG BohuTANG merged commit cf84449 into databendlabs:main Dec 10, 2024
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants