Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: all op(Null) is false in expr #35527

Merged
merged 4 commits into from
Oct 17, 2024

Conversation

smellthemoon
Copy link
Contributor

@sre-ci-robot sre-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines. label Aug 16, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement ci-passed labels Aug 16, 2024
Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 67.55556% with 219 lines in your changes missing coverage. Please review.

Project coverage is 81.45%. Comparing base (1bd3228) to head (37ce45b).
Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/exec/expression/CompareExpr.cpp 34.44% 59 Missing ⚠️
internal/core/src/exec/expression/Expr.h 46.15% 49 Missing ⚠️
internal/core/src/segcore/Utils.cpp 44.44% 35 Missing ⚠️
...src/exec/expression/BinaryArithOpEvalRangeExpr.cpp 80.95% 12 Missing ⚠️
...rnal/core/src/segcore/ChunkedSegmentSealedImpl.cpp 0.00% 10 Missing ⚠️
internal/core/src/exec/expression/CompareExpr.h 52.63% 9 Missing ⚠️
internal/core/src/exec/expression/TermExpr.cpp 82.60% 8 Missing ⚠️
internal/core/src/exec/expression/UnaryExpr.cpp 92.30% 5 Missing ⚠️
...ernal/core/src/exec/expression/BinaryRangeExpr.cpp 86.66% 4 Missing ⚠️
...rnal/core/src/exec/expression/JsonContainsExpr.cpp 93.44% 4 Missing ⚠️
... and 10 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #35527      +/-   ##
==========================================
+ Coverage   72.50%   81.45%   +8.94%     
==========================================
  Files        1308     1308              
  Lines      156203   156659     +456     
==========================================
+ Hits       113255   127602   +14347     
+ Misses      37817    23941   -13876     
+ Partials     5131     5116      -15     
Files with missing lines Coverage Δ
internal/core/src/common/FieldData.cpp 73.91% <100.00%> (+73.91%) ⬆️
internal/core/src/common/Vector.h 100.00% <100.00%> (+93.33%) ⬆️
...ternal/core/src/exec/expression/AlwaysTrueExpr.cpp 85.71% <100.00%> (+85.71%) ⬆️
...e/src/exec/expression/BinaryArithOpEvalRangeExpr.h 100.00% <100.00%> (+100.00%) ⬆️
internal/core/src/exec/expression/ExistsExpr.cpp 86.66% <100.00%> (+86.66%) ⬆️
...rnal/core/src/exec/expression/LogicalUnaryExpr.cpp 100.00% <100.00%> (+100.00%) ⬆️
internal/core/src/exec/operator/FilterBitsNode.cpp 96.07% <100.00%> (+96.07%) ⬆️
internal/core/src/index/BitmapIndex.h 55.55% <ø> (+55.55%) ⬆️
internal/core/src/index/ScalarIndex.h 27.77% <ø> (+25.00%) ⬆️
internal/core/src/index/ScalarIndexSort.h 42.85% <ø> (ø)
... and 23 more

... and 212 files with indirect coverage changes

@smellthemoon smellthemoon force-pushed the expr-null-1 branch 3 times, most recently from 04aa42e to 030304b Compare August 27, 2024 06:10
@tedxu
Copy link
Contributor

tedxu commented Aug 27, 2024

/lgtm

Copy link
Contributor

mergify bot commented Aug 27, 2024

@smellthemoon E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Aug 27, 2024

@smellthemoon E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Aug 27, 2024

@smellthemoon E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@smellthemoon
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Aug 28, 2024

@smellthemoon E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@tedxu
Copy link
Contributor

tedxu commented Aug 29, 2024

/lgtm

Copy link
Contributor

mergify bot commented Sep 12, 2024

@smellthemoon E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@sre-ci-robot sre-ci-robot added the test/integration integration test label Sep 20, 2024
Copy link
Contributor

mergify bot commented Sep 20, 2024

@smellthemoon E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 20, 2024

@smellthemoon go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 23, 2024

@smellthemoon E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@@ -315,7 +315,9 @@ class ThreadSafeValidData {
data_.resize(length_ + num_rows);
}
auto src = data->valid_data().data();
std::copy_n(src, num_rows, data_.data() + length_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? std::copy_n seems to be faster and easier to understand

@@ -70,12 +71,16 @@ ScalarIndexSort<T>::Build(size_t n, const T* values) {
data_.reserve(n);
total_num_rows_ = n;
valid_bitset = TargetBitmap(total_num_rows_, false);
idx_to_offsets_.resize(n);
idx_to_offsets_.resize(n, -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we should not use -1 has hacked value, can we use valid_bitset?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this valid_bitset should be named to valid_bitset_

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could maintain offert -> index and use valid bitset to check.
use -1 is a little bit hacky

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or as marisa trie, all the invalid should be just marked as idx_to_offsets_[index] == MARISA_NULL_KEY_ID

@@ -70,12 +71,16 @@ ScalarIndexSort<T>::Build(size_t n, const T* values) {
data_.reserve(n);
total_num_rows_ = n;
valid_bitset = TargetBitmap(total_num_rows_, false);
idx_to_offsets_.resize(n);
idx_to_offsets_.resize(n, -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could maintain offert -> index and use valid bitset to check.
use -1 is a little bit hacky

@@ -70,12 +71,16 @@ ScalarIndexSort<T>::Build(size_t n, const T* values) {
data_.reserve(n);
total_num_rows_ = n;
valid_bitset = TargetBitmap(total_num_rows_, false);
idx_to_offsets_.resize(n);
idx_to_offsets_.resize(n, -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or as marisa trie, all the invalid should be just marked as idx_to_offsets_[index] == MARISA_NULL_KEY_ID

is_source_node_
? std::make_shared<ColumnVector>(TargetBitmap(active_count_))
: GetColumnVector(input_);
auto col_input = is_source_node_ ? std::make_shared<ColumnVector>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment, the first vector is filtering result and second bitset is a valid bit set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is very hard to understand without proper comment

cmp_op));
}
};
if (valid_data == nullptr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this is necessary. simply execute by batch and bypass the valid bitset.
for example, [1, null, 2,3], valid data is [1,2,3] and valid bitset is [1,0,1,1]

@@ -46,12 +47,22 @@ PhyCompareFilterExpr::GetChunkData(FieldId field_id,
auto& indexing = segment_->chunk_scalar_index<T>(field_id, chunk_id);
if (indexing.HasRawData()) {
return [&indexing](int i) -> const number {
return indexing.Reverse_Lookup(i);
if (!indexing.Reverse_Lookup(i).has_value()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never do two reverse lookup

@@ -146,8 +173,16 @@ PhyCompareFilterExpr::ExecCompareExprDispatcher(OpType op) {
for (int i = chunk_id == current_chunk_id_ ? current_chunk_pos_ : 0;
i < chunk_size;
++i) {
res[processed_rows++] = boost::apply_visitor(
milvus::query::Relational<decltype(op)>{}, left(i), right(i));
if (!left(i).has_value() || !right(i).has_value()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, please check for everywhere, reverse lookup can be done only once

@@ -253,7 +257,16 @@ class SegmentExpr : public Expr {
if (!skip_func || !skip_func(skip_index, field_id_, i)) {
auto chunk = segment_->chunk_data<T>(field_id_, i);
const T* data = chunk.data() + data_pos;
func(data, size, res + processed_size, values...);
const bool* valid_data = chunk.valid_data();
if (valid_data != nullptr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes we used valid_data != nullptr but others we use !valid_data

lixinguo added 4 commits October 15, 2024 19:01
@xiaofan-luan
Copy link
Collaborator

/lgtm
/approve

@xiaofan-luan xiaofan-luan merged commit eb3e458 into milvus-io:master Oct 17, 2024
14 of 16 checks passed
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smellthemoon, xiaofan-luan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/compilation area/test ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm sig/testing size/XXL Denotes a PR that changes 1000+ lines. test/integration integration test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants