-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: runtime filter #13842
refactor: runtime filter #13842
Conversation
45edbc1
to
e34ba31
Compare
f616257
to
b23bf8a
Compare
Docker Image for PR
|
06ac33c
to
ae6e2ca
Compare
How to generate the test data of t1 and t2, is it numbers(1_000_000_000)? |
yeah |
Docker Image for PR
|
Docker Image for PR
|
part: &PartInfoPtr, | ||
filters: &Vec<Expr<String>>, | ||
func_ctx: &FunctionContext, | ||
) -> Result<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need adding the runtime filter stats to explain? Now the stats:
├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 755 to 755, bloom pruning: 0 to 0>]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current stats are collected before the pipeline runs, but runtime filter stats will be generated during the pipeline running. Maybe we can try to add runtime filter stats to query profile later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding runtime_filter-related stats to the query log? @BohuTANG
src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_build_state.rs
Outdated
Show resolved
Hide resolved
src/query/service/src/pipelines/processors/transforms/hash_join/util.rs
Outdated
Show resolved
Hide resolved
3f8a04b
to
06562a8
Compare
src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_build_state.rs
Outdated
Show resolved
Hide resolved
rest LGTM ! |
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
Intro:
Adaptive derivation of new predicates at runtime is used to filter the join probe side to improve performance.
New predicates generated at runtime are pushed down through the processor to the table scan on the probe side for prune, thus improving performance significantly.
Simple benchmark
A simple example that is perfect for runtime filtering
cluster:
single node:
Adaptive:
Others:
Runtime filter will be saved into QueryCtx by
HashMap
, key is the table index, and values are filters for the table. Table will get the corresponding filters from ctx by table index to prune before it starts to read data.This change is