-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: all op(Null) is false in expr #35527
Conversation
04aa42e
to
030304b
Compare
/lgtm |
@smellthemoon E2e jenkins job failed, comment |
030304b
to
32a8013
Compare
@smellthemoon E2e jenkins job failed, comment |
1 similar comment
@smellthemoon E2e jenkins job failed, comment |
/run-cpu-e2e |
@smellthemoon E2e jenkins job failed, comment |
32a8013
to
58430a1
Compare
/lgtm |
58430a1
to
a8a500a
Compare
@smellthemoon E2e jenkins job failed, comment |
@smellthemoon E2e jenkins job failed, comment |
@smellthemoon go-sdk check failed, comment |
bd7342f
to
af1bcc4
Compare
@smellthemoon E2e jenkins job failed, comment |
af1bcc4
to
6c93c57
Compare
@@ -315,7 +315,9 @@ class ThreadSafeValidData { | |||
data_.resize(length_ + num_rows); | |||
} | |||
auto src = data->valid_data().data(); | |||
std::copy_n(src, num_rows, data_.data() + length_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? std::copy_n seems to be faster and easier to understand
@@ -70,12 +71,16 @@ ScalarIndexSort<T>::Build(size_t n, const T* values) { | |||
data_.reserve(n); | |||
total_num_rows_ = n; | |||
valid_bitset = TargetBitmap(total_num_rows_, false); | |||
idx_to_offsets_.resize(n); | |||
idx_to_offsets_.resize(n, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we should not use -1 has hacked value, can we use valid_bitset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this valid_bitset should be named to valid_bitset_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could maintain offert -> index and use valid bitset to check.
use -1 is a little bit hacky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or as marisa trie, all the invalid should be just marked as idx_to_offsets_[index] == MARISA_NULL_KEY_ID
@@ -70,12 +71,16 @@ ScalarIndexSort<T>::Build(size_t n, const T* values) { | |||
data_.reserve(n); | |||
total_num_rows_ = n; | |||
valid_bitset = TargetBitmap(total_num_rows_, false); | |||
idx_to_offsets_.resize(n); | |||
idx_to_offsets_.resize(n, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could maintain offert -> index and use valid bitset to check.
use -1 is a little bit hacky
@@ -70,12 +71,16 @@ ScalarIndexSort<T>::Build(size_t n, const T* values) { | |||
data_.reserve(n); | |||
total_num_rows_ = n; | |||
valid_bitset = TargetBitmap(total_num_rows_, false); | |||
idx_to_offsets_.resize(n); | |||
idx_to_offsets_.resize(n, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or as marisa trie, all the invalid should be just marked as idx_to_offsets_[index] == MARISA_NULL_KEY_ID
is_source_node_ | ||
? std::make_shared<ColumnVector>(TargetBitmap(active_count_)) | ||
: GetColumnVector(input_); | ||
auto col_input = is_source_node_ ? std::make_shared<ColumnVector>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment, the first vector is filtering result and second bitset is a valid bit set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is very hard to understand without proper comment
cmp_op)); | ||
} | ||
}; | ||
if (valid_data == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think this is necessary. simply execute by batch and bypass the valid bitset.
for example, [1, null, 2,3], valid data is [1,2,3] and valid bitset is [1,0,1,1]
@@ -46,12 +47,22 @@ PhyCompareFilterExpr::GetChunkData(FieldId field_id, | |||
auto& indexing = segment_->chunk_scalar_index<T>(field_id, chunk_id); | |||
if (indexing.HasRawData()) { | |||
return [&indexing](int i) -> const number { | |||
return indexing.Reverse_Lookup(i); | |||
if (!indexing.Reverse_Lookup(i).has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never do two reverse lookup
@@ -146,8 +173,16 @@ PhyCompareFilterExpr::ExecCompareExprDispatcher(OpType op) { | |||
for (int i = chunk_id == current_chunk_id_ ? current_chunk_pos_ : 0; | |||
i < chunk_size; | |||
++i) { | |||
res[processed_rows++] = boost::apply_visitor( | |||
milvus::query::Relational<decltype(op)>{}, left(i), right(i)); | |||
if (!left(i).has_value() || !right(i).has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, please check for everywhere, reverse lookup can be done only once
@@ -253,7 +257,16 @@ class SegmentExpr : public Expr { | |||
if (!skip_func || !skip_func(skip_index, field_id_, i)) { | |||
auto chunk = segment_->chunk_data<T>(field_id_, i); | |||
const T* data = chunk.data() + data_pos; | |||
func(data, size, res + processed_size, values...); | |||
const bool* valid_data = chunk.valid_data(); | |||
if (valid_data != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes we used valid_data != nullptr but others we use !valid_data
6c93c57
to
77090b8
Compare
Signed-off-by: lixinguo <[email protected]>
Signed-off-by: lixinguo <[email protected]>
Signed-off-by: lixinguo <[email protected]>
Signed-off-by: lixinguo <[email protected]>
77090b8
to
37ce45b
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smellthemoon, xiaofan-luan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
#31728