Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: target build optimization for merge into #14066

Merged
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
c600104
init blockinfo hashtable
JackTan25 Jan 2, 2024
52b2929
Merge branch 'main' of https://github.com/datafuselabs/databend into …
JackTan25 Jan 5, 2024
c28de64
add some comments
JackTan25 Jan 8, 2024
c0f4606
Merge branch 'main' of https://github.com/datafuselabs/databend into …
JackTan25 Jan 8, 2024
e4a1814
add more comments for hash_table interface
JackTan25 Jan 8, 2024
ce199d4
add merge_into_join_type info and block_info index
JackTan25 Jan 9, 2024
4bc6e91
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 9, 2024
45f9099
add block info hashtable basic implementation
JackTan25 Jan 9, 2024
f24d328
fix typos
JackTan25 Jan 9, 2024
356404c
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 9, 2024
11dc6d0
add RowPrefix for native_deserialize and parquet_deserialize
JackTan25 Jan 9, 2024
d485bed
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 9, 2024
c4bdedb
fix lint
JackTan25 Jan 9, 2024
81c47bb
add gather_partial_modified and reduce_false_matched
JackTan25 Jan 9, 2024
1676466
refactor: remove block info hashtable and build blockinfo index outsi…
JackTan25 Jan 10, 2024
6a3f722
fix blockinfo index
JackTan25 Jan 10, 2024
8d8a423
gather partial modified blocks and fix lint
JackTan25 Jan 10, 2024
4f7968c
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 10, 2024
7997f73
Merge branch 'main' of https://github.com/datafuselabs/databend into …
JackTan25 Jan 11, 2024
fc6780f
remove rowid when use target table as build side
JackTan25 Jan 11, 2024
859f0e1
support target_build_optimization for merge into pipeline in standalo…
JackTan25 Jan 11, 2024
3726c7e
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 11, 2024
58810d6
add more tests, and enhance explain merge into, add fix add merge sta…
JackTan25 Jan 11, 2024
87ae738
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 11, 2024
d24d81c
Merge branch 'main' of https://github.com/datafuselabs/databend into …
JackTan25 Jan 11, 2024
d316c8a
add probe done output logic and add more tests
JackTan25 Jan 11, 2024
3e62b92
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 11, 2024
f9c8876
add one chunk ut test for block_info_index
JackTan25 Jan 11, 2024
404f9c8
fix test result
JackTan25 Jan 12, 2024
7f991cb
add more commnnts for merge into strategies, and fix rowid read
JackTan25 Jan 12, 2024
25553da
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 12, 2024
575d14c
fix test
JackTan25 Jan 12, 2024
7495e71
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 12, 2024
4e503ff
fix split
JackTan25 Jan 12, 2024
612cebf
fix block_info_index init, matched offsets update and add target_tabl…
JackTan25 Jan 12, 2024
7d4c73c
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 12, 2024
47c44ee
fix all matched delete for target build optimization
JackTan25 Jan 12, 2024
eb16d7d
fix test
JackTan25 Jan 12, 2024
2b99973
Merge branch 'main' of https://github.com/datafuselabs/databend into …
JackTan25 Jan 13, 2024
10879f5
add info log
JackTan25 Jan 13, 2024
ed364da
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 13, 2024
33c0c4d
add logs
JackTan25 Jan 13, 2024
1b3ed9a
add debug logs
JackTan25 Jan 13, 2024
01d8099
add debug logs
JackTan25 Jan 13, 2024
8acc5f7
fix lint
JackTan25 Jan 13, 2024
c0a6927
forbiden native engine for target build optimization
JackTan25 Jan 13, 2024
71ad3d0
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 14, 2024
ababc1a
add logs
JackTan25 Jan 14, 2024
14b3434
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 14, 2024
0901be5
add more log
JackTan25 Jan 14, 2024
16dde6c
add debug log
JackTan25 Jan 14, 2024
2e29d9e
fix multi chunks start offset and add skip chunk ut test
JackTan25 Jan 14, 2024
bea75aa
support recieve duplicated block for matched_mutator
JackTan25 Jan 14, 2024
bbb481d
fix conflict
JackTan25 Jan 15, 2024
8178714
move logic code
JackTan25 Jan 15, 2024
9028850
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 15, 2024
542ba8e
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 15, 2024
f65dec2
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 15, 2024
9131116
fix flaky matched and fix offset for pointer (chunk_offsets shouldn't…
JackTan25 Jan 15, 2024
3c9dbdc
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 15, 2024
0e0e4db
add merge_state
JackTan25 Jan 16, 2024
20b9f58
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 16, 2024
234961c
refactor codes
JackTan25 Jan 16, 2024
42ab0b5
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 16, 2024
ea07f61
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 16, 2024
507f7c2
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 16, 2024
70d42c9
add more commnets
JackTan25 Jan 16, 2024
83d3f35
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 16, 2024
81d7933
fix conflict
JackTan25 Jan 16, 2024
0711dc7
refactor codes, split merge into optimziation codes into other files
JackTan25 Jan 16, 2024
865ea8b
remove a.txt
JackTan25 Jan 16, 2024
7bbedcd
fix check
JackTan25 Jan 17, 2024
d2f56ae
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 17, 2024
f5ca491
chore: modify function name
JackTan25 Jan 17, 2024
aab662c
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 17, 2024
f4bb2cf
rename variables with merge_into prefix
JackTan25 Jan 17, 2024
5f4d9b2
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 17, 2024
fbd7cba
rename function
JackTan25 Jan 17, 2024
8cfa662
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 17, 2024
b04c43b
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 17, 2024
c5cb3a9
move merge_into_try_build_block_info_index to front
JackTan25 Jan 17, 2024
f835518
Merge branch 'new_hashtable_with_blkinfo_for_target_build' of https:/…
JackTan25 Jan 17, 2024
5b50b6e
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 17, 2024
f13e3bd
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 17, 2024
363f37f
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 17, 2024
7858b41
Merge branch 'main' into new_hashtable_with_blkinfo_for_target_build
JackTan25 Jan 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions src/common/hashtable/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,18 @@
extern crate core;

mod container;
mod dictionary_string_hashtable;

mod hashjoin_hashtable;
mod hashjoin_string_hashtable;
mod hashtable;
mod keys_ref;
mod lookup_hashtable;
mod stack_hashtable;
mod table0;

mod dictionary_string_hashtable;
mod partitioned_hashtable;
mod short_string_hashtable;
mod stack_hashtable;
mod string_hashtable;
mod table0;
#[allow(dead_code)]
mod table1;
mod table_empty;
Expand Down Expand Up @@ -113,3 +113,5 @@ pub use partitioned_hashtable::hash2bucket;
pub type HashJoinHashMap<K> = hashjoin_hashtable::HashJoinHashTable<K>;
pub type BinaryHashJoinHashMap = hashjoin_string_hashtable::HashJoinStringHashTable;
pub use traits::HashJoinHashtableLike;
pub use utils::BlockInfoIndex;
pub use utils::Interval;
22 changes: 22 additions & 0 deletions src/common/hashtable/src/traits.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ use ethnum::i256;
use ethnum::U256;
use ordered_float::OrderedFloat;

use crate::utils::Interval;
JackTan25 marked this conversation as resolved.
Show resolved Hide resolved
use crate::RowPtr;

/// # Safety
Expand Down Expand Up @@ -508,21 +509,42 @@ pub trait HashJoinHashtableLike {
type Key: ?Sized;

// Using hashes to probe hash table and converting them in-place to pointers for memory reuse.
// same with `early_filtering_probe`, but we don't use early_filter
fn probe(&self, hashes: &mut [u64], bitmap: Option<Bitmap>) -> usize;

// Using hashes to probe hash table and converting them in-place to pointers for memory reuse.
// 1. same with `early_filtering_probe_with_selection`, but we don't use selection to preserve the
// unfiltered indexes, we just set the filtered hashes as zero.
// 2. return the unfiltered counts.
fn early_filtering_probe(&self, hashes: &mut [u64], bitmap: Option<Bitmap>) -> usize;

// Using hashes to probe hash table and converting them in-place to pointers for memory reuse.
// we use `early_filtering_probe_with_selection` to do the first round probe.
// 1. `hashes` is the hash value of probe block's rows. we will use this one to
// do early filtering. if we can't early filter one row(at idx), we will assign pointer in
// the bucket to hashes[idx] to reuse the memory.
// 2. `selection` is used to preserved the indexes which can't be early_filtered.
// 3. return the count of preserved the indexes in `selection`
fn early_filtering_probe_with_selection(
&self,
hashes: &mut [u64],
valids: Option<Bitmap>,
selection: &mut [u32],
) -> usize;

// we use `next_contains` to see whether we can find a matched row in the link.
// the ptr is the link header.
fn next_contains(&self, key: &Self::Key, ptr: u64) -> bool;

/// 1. `key` is the serialize build key from one row
/// 2. `ptr` pointers to the *RawEntry for of the bucket correlated to key.So before this method,
/// we will do a round probe firstly. If the ptr is zero, it means there is no correlated bucket
/// for key
/// 3. `vec_ptr` is RowPtr Array, we use this one to record the matched row in chunks
/// 4. `occupied` is the length for vec_ptr
/// 5. `capacity` is the capacity of vec_ptr
/// 6. return macthed rows count and next ptr which need to test in the future.
/// if the capacity is enough, the next ptr is zero, otherwise next ptr is valid.
fn next_probe(
&self,
key: &Self::Key,
Expand Down
Loading
Loading