-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): new filter execution framework #13846
Merged
BohuTANG
merged 88 commits into
databendlabs:main
from
Dousir9:improve_filter_execution
Dec 29, 2023
Merged
feat(query): new filter execution framework #13846
BohuTANG
merged 88 commits into
databendlabs:main
from
Dousir9:improve_filter_execution
Dec 29, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
the
pr-feature
this PR introduces a new feature to the codebase
label
Nov 29, 2023
sundy-li
reviewed
Dec 22, 2023
xudong963
approved these changes
Dec 27, 2023
sundy-li
reviewed
Dec 28, 2023
sundy-li
reviewed
Dec 28, 2023
sundy-li
reviewed
Dec 28, 2023
sundy-li
approved these changes
Dec 29, 2023
Conflicting files |
11 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/
Summary
The implementation Databend filter in the past
In the past, the execution of Databend Filter was implemented by the
Evaluator
. For the given SQL:select * from t where a = 3 and (b < 7.2 or b > 12.6) and c < 6;
, TheEvaluator
processes each predicate (a = 3
,b < 7.2
,b > 12.6
,c < 6
) and generates a bitmap for each. These bitmaps are then pairwise combined using&
and|
operations based on theAND
andOR
, resulting in a final bitmap calledfilter
. Finally, theEvaluator
uses thefilter
to invoke thefilter_boolean_value
of the DataBlock, generating the filtered DataBlock.The disadvantages of old implementation.
Frequent construction/destruction of bitmaps will lead to significant memory fragmentation: each comparison operator, such as
a = 3
, will generate a bitmap for result. If the column is nullable, an additional bitmap is generated for validity. In other words, for a where condition likea = 3 and (b < 7.2 or b > 12.6) and c < 6
,Evaluator
will generate up to 8 bitmaps during execution for single DataBlock.The independent execution of filtering predicates can result in inefficiencies: If a certain row in a DataBlock has already been filtered out by one predicate, other predicates will still filter it again, leading to unnecessary CPU overhead.
using
filter bitmap
to invokefilter_boolean_value
on the DataBlock to generate the filtered DataBlock may not always be optimal.New Filter Execution Framework
We introduced a groundbreaking concept, defining it as the "Immutable Index". By combining the Immutable Index with the SelectStrategy, we have addressed the drawbacks of DuckDB Filter ! 🚀, the Immutable Index enables us to avoid generating temporary selection buffer when encountering AND and OR operations. This not only reduces memory fragmentation but It also eliminates the cyclic copying from temporary selection to final selection.
New filter execution framework avoid memory allocation by using reusable
true_selection
andfalse_selection
(only generatingfalse_selection
when OR predicates are present) instead of bitmaps..The execution of predicates is dynamically linked through
true_selection
andfalse_selection
, ensuring that each row in the DataBlock is filtered only once. This significantly optimizes performance (TPC-H Q12, Q18) and accommodates complex predicates effectively.By employing a heuristic strategy and dynamically choosing between using
take
ortake_range
to generate the DataBlock, this approach is more efficient than using filter bitmap to invokefilter_boolean_value
on the DataBlock.Benchmark
Q19 External Parquet: 14.5s -> 9.7s
before:
after:
This change is
Tests
Type of change