-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: target build optimization for merge into #14066
Merged
BohuTANG
merged 86 commits into
databendlabs:main
from
JackTan25:new_hashtable_with_blkinfo_for_target_build
Jan 18, 2024
Merged
feat: target build optimization for merge into #14066
BohuTANG
merged 86 commits into
databendlabs:main
from
JackTan25:new_hashtable_with_blkinfo_for_target_build
Jan 18, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
the
pr-feature
this PR introduces a new feature to the codebase
label
Dec 18, 2023
JackTan25
changed the title
feat:
feat: new hashtable with blkinfo for target build merge into
Dec 18, 2023
JackTan25
force-pushed
the
new_hashtable_with_blkinfo_for_target_build
branch
from
December 31, 2023 18:04
8ad47c7
to
16b0e53
Compare
…new_hashtable_with_blkinfo_for_target_build
…new_hashtable_with_blkinfo_for_target_build
sundy-li
reviewed
Jan 10, 2024
…de, add check multirows conflict
…new_hashtable_with_blkinfo_for_target_build
…/github.com/JackTan25/databend into new_hashtable_with_blkinfo_for_target_build
Dousir9
reviewed
Jan 17, 2024
Dousir9
reviewed
Jan 17, 2024
src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_build_state.rs
Show resolved
Hide resolved
Dousir9
reviewed
Jan 17, 2024
src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_probe_state.rs
Show resolved
Hide resolved
Dousir9
reviewed
Jan 17, 2024
src/query/service/src/pipelines/processors/transforms/hash_join/probe_join/left_join.rs
Outdated
Show resolved
Hide resolved
Dousir9
reviewed
Jan 17, 2024
src/query/service/src/pipelines/processors/transforms/hash_join/transform_hash_join_probe.rs
Show resolved
Hide resolved
…/github.com/JackTan25/databend into new_hashtable_with_blkinfo_for_target_build
The part about hash join looks good to me. |
Dousir9
approved these changes
Jan 17, 2024
…/github.com/JackTan25/databend into new_hashtable_with_blkinfo_for_target_build
zhyass
approved these changes
Jan 17, 2024
sundy-li
approved these changes
Jan 18, 2024
Xuanwo
pushed a commit
to Xuanwo/databend
that referenced
this pull request
Jan 19, 2024
* init blockinfo hashtable * add some comments * add more comments for hash_table interface * add merge_into_join_type info and block_info index * add block info hashtable basic implementation * fix typos * add RowPrefix for native_deserialize and parquet_deserialize * fix lint * add gather_partial_modified and reduce_false_matched * refactor: remove block info hashtable and build blockinfo index outside, add check multirows conflict * fix blockinfo index * gather partial modified blocks and fix lint * remove rowid when use target table as build side * support target_build_optimization for merge into pipeline in standalone mode * add more tests, and enhance explain merge into, add fix add merge status when target table build optimization is triggered * add probe done output logic and add more tests * add one chunk ut test for block_info_index * fix test result * add more commnnts for merge into strategies, and fix rowid read * fix test * fix split * fix block_info_index init, matched offsets update and add target_table_schema for partial unmodified blocks to append directly, add probe attach for target_build_optimization, fix merge intp explain update order * fix all matched delete for target build optimization * fix test * add info log * add logs * add debug logs * add debug logs * fix lint * forbiden native engine for target build optimization * add logs * add more log * add debug log * fix multi chunks start offset and add skip chunk ut test * support recieve duplicated block for matched_mutator * move logic code * fix flaky matched and fix offset for pointer (chunk_offsets shouldn't minus one) * add merge_state * refactor codes * add more commnets * refactor codes, split merge into optimziation codes into other files * remove a.txt * fix check * chore: modify function name * rename variables with merge_into prefix * rename function * move merge_into_try_build_block_info_index to front
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
ci-benchmark
Benchmark: run all test
ci-cloud
Build docker image for cloud test
pr-feature
this PR introduces a new feature to the codebase
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This pr waits for #13950
import new hashtable with blkinfo for target build merge into, it benefits:
old implementation:
now we will kill the red load part.
The design and usage:
This Index can be only used for target build merge into (both standalone and distributed mode, but for this pr we just support standalone).
Advantages:
Disadvantages:
potentially leading to the target table being unsuitable for use as a build table in the future.
merge into t using source on xxx when matched then update xxx when not macthed then insert xxx
.Future Enhancement
the others case we need to enhance:
Pr Design
BlockInfoIndex
for hash table. If we don't trigger spill, we will put all data inchunks
. Fortarget table
as build side, we will read all data blocks fromtarget table
. But chunks will merge some blocks into a larger chunk. So we will get the layout like below:So we use BlockInfoIndex to maintain an index for the block info in chunks.
And we can use BlockInfoIndex to find out the partial modified blocks quickly:
Optimiaztions tracking: #12595
Tests
Type of change
This change is