Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support skip-index based on chunk-metrics to accelerate expr filter(#27925) #28297

Merged

Conversation

MrPresent-Han
Copy link
Contributor

related: #27925

@sre-ci-robot sre-ci-robot added size/XL Denotes a PR that changes 500-999 lines. area/compilation labels Nov 9, 2023
@mergify mergify bot added the dco-passed DCO check passed. label Nov 9, 2023
Copy link
Contributor Author

@MrPresent-Han MrPresent-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need review for interface design and basic functions has been testified correctly on my local macOS

@@ -307,4 +313,44 @@ ScalarIndexSort<T>::Reverse_Lookup(size_t idx) const {
auto offset = idx_to_offsets_[idx];
return data_[offset].a_;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregate unary and binary judgment into one function

@@ -293,11 +302,18 @@ ExecExprVisitor::ExecRangeVisitorImpl(FieldId field_id,
"[ExecExprVisitor]Data size not equal to size_per_chunk");
results.emplace_back(std::move(data));
}

for (auto chunk_id = indexing_barrier; chunk_id < num_chunk; ++chunk_id) {
auto this_size = chunk_id == num_chunk - 1
? row_count_ - chunk_id * size_per_chunk
: size_per_chunk;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use skip Index here when executing expr

@@ -293,4 +293,19 @@ SegmentInternalInterface::timestamp_filter(BitsetType& bitset,
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at present, SkipIndex is a segment-level component, and relating load is also triggered by segment

skipIndex_.Load(field_id, chunk_id, data_type, chunk_data, count);
}


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need a returned value to indicate whether successful or not for loading skipIndex


FieldChunkMetrics() : hasValue_(false){};
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkipIndex is a separate class and can be extended for using fieldChunkMetrics in various ways.

@@ -1381,3 +1381,98 @@ TEST(Sealed, LoadArrayFieldDataWithMMap) {
SealedLoadFieldData(dataset, *segment, {}, true);
segment->Search(plan.get(), ph_group.get());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old ut, I ' m changing now

Copy link
Contributor

mergify bot commented Nov 9, 2023

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 9, 2023

@MrPresent-Han ut workflow job failed, comment rerun ut can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch 2 times, most recently from 91f064b to 4884e45 Compare November 9, 2023 08:14
Copy link
Contributor

mergify bot commented Nov 9, 2023

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 9, 2023

@MrPresent-Han ut workflow job failed, comment rerun ut can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch 3 times, most recently from 569dca0 to 47fb9fa Compare November 9, 2023 10:39
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #28297 (f0e3aff) into master (f8aa464) will increase coverage by 0.09%.
Report is 12 commits behind head on master.
The diff coverage is 92.45%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #28297      +/-   ##
==========================================
+ Coverage   81.79%   81.89%   +0.09%     
==========================================
  Files         847      839       -8     
  Lines      119750   120076     +326     
==========================================
+ Hits        97950    98336     +386     
+ Misses      18542    18478      -64     
- Partials     3258     3262       +4     
Files Coverage Δ
internal/core/src/index/ScalarIndexSort.h 50.00% <ø> (ø)
...nternal/core/src/query/generated/ExecExprVisitor.h 100.00% <ø> (ø)
internal/core/src/segcore/AckResponder.h 100.00% <ø> (ø)
internal/core/src/segcore/SegmentInterface.cpp 92.63% <100.00%> (-1.41%) ⬇️
internal/core/src/segcore/SegmentInterface.h 100.00% <ø> (ø)
internal/core/src/segcore/SegmentSealedImpl.cpp 76.97% <100.00%> (-0.48%) ⬇️
internal/core/src/segcore/SkipIndex.cpp 100.00% <100.00%> (ø)
internal/querynodev2/segments/segment_loader.go 74.18% <100.00%> (-0.61%) ⬇️
internal/core/src/index/ScalarIndexSort-inl.h 67.50% <83.33%> (+3.25%) ⬆️
internal/core/src/segcore/SkipIndex.h 90.21% <90.21%> (ø)
... and 1 more

... and 177 files with indirect coverage changes

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from 47fb9fa to 5aa0086 Compare November 9, 2023 12:30
Copy link
Contributor

mergify bot commented Nov 9, 2023

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 9, 2023

@MrPresent-Han ut workflow job failed, comment rerun ut can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from 5aa0086 to aa61c0f Compare November 10, 2023 02:36
Copy link
Contributor

mergify bot commented Nov 10, 2023

@MrPresent-Han ut workflow job failed, comment rerun ut can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from aa61c0f to cc8924b Compare November 10, 2023 03:57
Copy link
Contributor

mergify bot commented Nov 10, 2023

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from cc8924b to 6481542 Compare November 10, 2023 07:14
Copy link
Contributor

mergify bot commented Nov 10, 2023

@MrPresent-Han ut workflow job failed, comment rerun ut can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from 6481542 to 92a0270 Compare November 10, 2023 08:33
Copy link
Contributor

mergify bot commented Nov 10, 2023

@MrPresent-Han ut workflow job failed, comment rerun ut can trigger the job again.

@MrPresent-Han
Copy link
Contributor Author

rerun ut

@mergify mergify bot added the ci-passed label Nov 10, 2023
@@ -1641,8 +1749,11 @@ ExecExprVisitor::visit(UnaryRangeExpr& expr) {
}
case DataType::VARCHAR: {
if (segment_.type() == SegmentType::Growing) {
std::cout << "hc===execute unary range: string" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from 92a0270 to 842396d Compare November 13, 2023 03:00
@mergify mergify bot removed the ci-passed label Nov 13, 2023
Copy link
Contributor

mergify bot commented Nov 13, 2023

@MrPresent-Han E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from 842396d to 81e8312 Compare November 13, 2023 03:25
@MrPresent-Han
Copy link
Contributor Author

rerun ut

@mergify mergify bot added the ci-passed label Nov 13, 2023
Copy link
Member

@yah01 yah01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip for term expr would come later?

@MrPresent-Han
Copy link
Contributor Author

skip for term expr would come later?

emm, I forget this, good catch

@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from 81e8312 to bffca94 Compare November 14, 2023 09:19
@mergify mergify bot removed the ci-passed label Nov 14, 2023
@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from bffca94 to 1a11a06 Compare November 14, 2023 09:23
@MrPresent-Han MrPresent-Han force-pushed the support-field-chunk-metrics-new branch from 1a11a06 to f0e3aff Compare November 14, 2023 09:24
@mergify mergify bot added the ci-passed label Nov 14, 2023
@MrPresent-Han
Copy link
Contributor Author

skip for term expr would come later?

changed

Copy link
Contributor Author

@MrPresent-Han MrPresent-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support skip term, need review

@@ -2522,9 +2633,19 @@ ExecExprVisitor::ExecTermVisitorImplTemplate<bool>(TermExpr& expr_raw)
// return std::binary_search(terms.begin(), terms.end(), x);
return term_set.find(x) != term_set.end();
};
auto skip_index_func =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed, support term @yah01

Copy link
Member

@yah01 yah01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jiaoew1991
Copy link
Contributor

/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiaoew1991, MrPresent-Han

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 836f300 into milvus-io:master Nov 15, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/compilation ci-passed dco-passed DCO check passed. lgtm size/XL Denotes a PR that changes 500-999 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants