Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: support iterative filter execution #37363

Merged
merged 5 commits into from
Dec 11, 2024

Conversation

chasingegg
Copy link
Contributor

issue: #37360

@sre-ci-robot sre-ci-robot added area/compilation size/XXL Denotes a PR that changes 1000+ lines. labels Nov 1, 2024
Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

  1. Title Format: The PR title must begin with one of these prefixes:
  • feat: for introducing a new feature.
  • fix: for bug fixes.
  • enhance: for improvements to existing functionality.
  • test: for add tests to existing functionality.
  • doc: for modifying documentation.
  • auto: for the pull request from bot.
  1. Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly 

Please review and update your PR to comply with these guidelines.

@chasingegg chasingegg changed the title Support post filter enhance: support post filter execution Nov 1, 2024
@mergify mergify bot added kind/enhancement Issues or changes related to enhancement and removed do-not-merge/invalid-pr-format labels Nov 1, 2024
Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 1, 2024

@chasingegg E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link

codecov bot commented Nov 1, 2024

Codecov Report

Attention: Patch coverage is 81.86851% with 262 lines in your changes missing coverage. Please review.

Project coverage is 81.06%. Comparing base (618f0cb) to head (6881210).
Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/exec/expression/Expr.h 52.03% 59 Missing ⚠️
...nal/core/src/exec/operator/IterativeFilterNode.cpp 76.51% 31 Missing ⚠️
internal/core/src/exec/expression/CompareExpr.h 64.61% 23 Missing ⚠️
internal/core/src/exec/expression/ColumnExpr.cpp 65.78% 13 Missing ⚠️
internal/core/src/exec/operator/Utils.h 53.57% 13 Missing ⚠️
internal/core/src/common/Chunk.cpp 8.33% 11 Missing ⚠️
...rnal/core/src/exec/expression/JsonContainsExpr.cpp 96.09% 10 Missing ⚠️
internal/core/src/exec/expression/ColumnExpr.h 0.00% 8 Missing ⚠️
internal/core/src/exec/expression/UnaryExpr.cpp 92.72% 8 Missing ⚠️
internal/core/src/exec/expression/UnaryExpr.h 72.41% 8 Missing ⚠️
... and 18 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #37363      +/-   ##
==========================================
- Coverage   81.08%   81.06%   -0.02%     
==========================================
  Files        1372     1375       +3     
  Lines      191564   192216     +652     
==========================================
+ Hits       155324   155825     +501     
- Misses      30740    30901     +161     
+ Partials     5500     5490      -10     
Components Coverage Δ
Client 74.43% <ø> (ø)
Core 68.93% <81.79%> (+0.08%) ⬆️
Go 83.22% <100.00%> (+0.01%) ⬆️
Files with missing lines Coverage Δ
internal/core/src/common/Chunk.h 60.71% <100.00%> (+0.47%) ⬆️
internal/core/src/common/QueryInfo.h 100.00% <ø> (ø)
internal/core/src/exec/Driver.cpp 81.39% <100.00%> (+0.44%) ⬆️
internal/core/src/exec/QueryContext.h 84.61% <100.00%> (+0.30%) ⬆️
...ternal/core/src/exec/expression/AlwaysTrueExpr.cpp 88.23% <100.00%> (+2.52%) ⬆️
...e/src/exec/expression/BinaryArithOpEvalRangeExpr.h 100.00% <100.00%> (ø)
...nternal/core/src/exec/expression/BinaryRangeExpr.h 94.00% <100.00%> (+1.31%) ⬆️
internal/core/src/exec/expression/CallExpr.cpp 100.00% <100.00%> (ø)
internal/core/src/exec/expression/EvalCtx.h 100.00% <100.00%> (ø)
internal/core/src/exec/expression/ExistsExpr.cpp 89.47% <100.00%> (+2.37%) ⬆️
... and 47 more

... and 32 files with indirect coverage changes

std::optional<int64_t>
get_iterator_batch_size() {
return milvus::index::GetValueFromConfig<int64_t>(
search_info_.search_params_, "batch_size");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to rename it as conflicting to E2E Iterator parameters?

MoveCursorForIndex();
if (segment_->HasFieldData(field_id_)) {
// when we specify input, do not maintain states
if (has_offset_input_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it does not need to move cursor here when input is specified? Not the other way around as in internal/core/src/exec/expression/LogicalBinaryExpr.h per say?

size_t hi,
float dist) {
while (lo < hi) {
size_t mid = (lo + hi) >> 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential overflow, use size_t mid = lo + ((hi - lo) >> 1)

double scalar_cost =
std::chrono::duration<double, std::micro>(scalar_end - scalar_start)
.count();
monitor::internal_core_search_latency_postfilter.Observe(scalar_cost);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see adding metrics as well!

auto col_vec_size = col_vec->size();
TargetBitmapView bitsetview(col_vec->GetRawData(),
col_vec_size);
Assert(bitsetview.size() <= batch_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this must be met? Is it possible that user gives a small batch_size?

}

RowVectorPtr
PhyFilterNode::GetOutput() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it support RangeSearch?

Copy link
Contributor

mergify bot commented Nov 5, 2024

@chasingegg cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@chasingegg chasingegg force-pushed the support-post-filter branch 3 times, most recently from 909b13b to 82aa236 Compare November 8, 2024 10:14
@mergify mergify bot added the ci-passed label Nov 8, 2024
Comment on lines 106 to 118
plan_node->search_info_.group_by_field_id_ == std::nullopt) {
plannode = std::make_shared<milvus::plan::MvccNode>(
milvus::plan::GetNextPlanNodeId());
sources = std::vector<milvus::plan::PlanNodePtr>{plannode};
plannode = std::make_shared<milvus::plan::VectorSearchNode>(
milvus::plan::GetNextPlanNodeId(), sources);
sources = std::vector<milvus::plan::PlanNodePtr>{plannode};

// add filter nodes after vector search node
auto expr = ParseExprs(anns_proto.predicates());
plannode = std::make_shared<plan::FilterNode>(
milvus::plan::GetNextPlanNodeId(), expr, sources);
sources = std::vector<milvus::plan::PlanNodePtr>{plannode};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapper as a function, same as below else

!search_info.search_params_.contains(RADIUS)) {
search_info.post_filter_execution =
search_info.search_params_[POST_FILTER];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that means user decide to whether using post filter or not ? Could decided by other method like stats info etc

// FilterNode will accept offsets array and execute over these and generate result valid offsets
namespace milvus {
namespace exec {
class PhyFilterNode : public Operator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhyPosterFilterBitsNode may more accurate. PhyFilterNode should not related with vector search node, not just return bits, it is more pure concept. When we support project function, add PhyFilterNode is better,same is FilterNode.h

@@ -28,17 +28,26 @@ namespace milvus {
namespace exec {

class ExprSet;

using OffsetVector = FixedVector<int64_t>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32_t is enough

@mergify mergify bot removed the ci-passed label Nov 14, 2024
Copy link
Contributor

mergify bot commented Nov 14, 2024

@chasingegg cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@chasingegg
Copy link
Contributor Author

rerun cpp-unit-test

@mergify mergify bot added the ci-passed label Nov 14, 2024
@mergify mergify bot added ci-passed and removed ci-passed labels Nov 14, 2024
@mergify mergify bot added the ci-passed label Dec 2, 2024
Copy link
Contributor

mergify bot commented Dec 5, 2024

@chasingegg Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

@mergify mergify bot added needs-dco DCO is missing in this pull request. dco-passed DCO check passed. and removed dco-passed DCO check passed. ci-passed needs-dco DCO is missing in this pull request. labels Dec 5, 2024
Signed-off-by: chasingegg <[email protected]>
Signed-off-by: chasingegg <[email protected]>
Signed-off-by: chasingegg <[email protected]>
Signed-off-by: chasingegg <[email protected]>
Signed-off-by: chasingegg <[email protected]>
Copy link
Contributor

mergify bot commented Dec 5, 2024

@chasingegg go-sdk check failed, comment rerun go-sdk can trigger the job again.

@chasingegg
Copy link
Contributor Author

rerun go-sdk

Copy link
Contributor

mergify bot commented Dec 5, 2024

@chasingegg go-sdk check failed, comment rerun go-sdk can trigger the job again.

@chasingegg
Copy link
Contributor Author

rerun go-sdk

@mergify mergify bot added the ci-passed label Dec 5, 2024
@chasingegg
Copy link
Contributor Author

chasingegg commented Dec 5, 2024

change some search_params formats and use the name hints

@chasingegg
Copy link
Contributor Author

/unhold

@czs007
Copy link
Collaborator

czs007 commented Dec 11, 2024

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chasingegg, czs007

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 994fc54 into milvus-io:master Dec 11, 2024
20 checks passed
@chasingegg chasingegg deleted the support-post-filter branch December 11, 2024 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/compilation ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants