-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: support iterative filter execution #37363
enhance: support iterative filter execution #37363
Conversation
Invalid PR Title Format Detected Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:
Required Title Structure:
Where Example:
Please review and update your PR to comply with these guidelines. |
@chasingegg go-sdk check failed, comment |
@chasingegg cpp-unit-test check failed, comment |
@chasingegg E2e jenkins job failed, comment |
std::optional<int64_t> | ||
get_iterator_batch_size() { | ||
return milvus::index::GetValueFromConfig<int64_t>( | ||
search_info_.search_params_, "batch_size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to rename it as conflicting to E2E Iterator parameters?
MoveCursorForIndex(); | ||
if (segment_->HasFieldData(field_id_)) { | ||
// when we specify input, do not maintain states | ||
if (has_offset_input_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it does not need to move cursor here when input is specified? Not the other way around as in internal/core/src/exec/expression/LogicalBinaryExpr.h
per say?
size_t hi, | ||
float dist) { | ||
while (lo < hi) { | ||
size_t mid = (lo + hi) >> 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential overflow, use size_t mid = lo + ((hi - lo) >> 1)
double scalar_cost = | ||
std::chrono::duration<double, std::micro>(scalar_end - scalar_start) | ||
.count(); | ||
monitor::internal_core_search_latency_postfilter.Observe(scalar_cost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see adding metrics as well!
auto col_vec_size = col_vec->size(); | ||
TargetBitmapView bitsetview(col_vec->GetRawData(), | ||
col_vec_size); | ||
Assert(bitsetview.size() <= batch_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this must be met? Is it possible that user gives a small batch_size
?
} | ||
|
||
RowVectorPtr | ||
PhyFilterNode::GetOutput() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it support RangeSearch?
6a7ea43
to
83d7a5e
Compare
@chasingegg cpp-unit-test check failed, comment |
909b13b
to
82aa236
Compare
plan_node->search_info_.group_by_field_id_ == std::nullopt) { | ||
plannode = std::make_shared<milvus::plan::MvccNode>( | ||
milvus::plan::GetNextPlanNodeId()); | ||
sources = std::vector<milvus::plan::PlanNodePtr>{plannode}; | ||
plannode = std::make_shared<milvus::plan::VectorSearchNode>( | ||
milvus::plan::GetNextPlanNodeId(), sources); | ||
sources = std::vector<milvus::plan::PlanNodePtr>{plannode}; | ||
|
||
// add filter nodes after vector search node | ||
auto expr = ParseExprs(anns_proto.predicates()); | ||
plannode = std::make_shared<plan::FilterNode>( | ||
milvus::plan::GetNextPlanNodeId(), expr, sources); | ||
sources = std::vector<milvus::plan::PlanNodePtr>{plannode}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapper as a function, same as below else
!search_info.search_params_.contains(RADIUS)) { | ||
search_info.post_filter_execution = | ||
search_info.search_params_[POST_FILTER]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that means user decide to whether using post filter or not ? Could decided by other method like stats info etc
// FilterNode will accept offsets array and execute over these and generate result valid offsets | ||
namespace milvus { | ||
namespace exec { | ||
class PhyFilterNode : public Operator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PhyPosterFilterBitsNode may more accurate. PhyFilterNode should not related with vector search node, not just return bits, it is more pure concept. When we support project function, add PhyFilterNode is better,same is FilterNode.h
@@ -28,17 +28,26 @@ namespace milvus { | |||
namespace exec { | |||
|
|||
class ExprSet; | |||
|
|||
using OffsetVector = FixedVector<int64_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int32_t is enough
82aa236
to
8dcc2db
Compare
@chasingegg cpp-unit-test check failed, comment |
rerun cpp-unit-test |
8dcc2db
to
3a9224e
Compare
0af05ee
to
8662f77
Compare
@chasingegg Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco. |
Signed-off-by: chasingegg <[email protected]>
Signed-off-by: chasingegg <[email protected]>
Signed-off-by: chasingegg <[email protected]>
Signed-off-by: chasingegg <[email protected]>
8662f77
to
90340c3
Compare
Signed-off-by: chasingegg <[email protected]>
@chasingegg go-sdk check failed, comment |
rerun go-sdk |
@chasingegg go-sdk check failed, comment |
rerun go-sdk |
change some search_params formats and use the name |
/unhold |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chasingegg, czs007 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
issue: #37360