Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: Avoid assign too much segment/channels to new querynode #34096

Merged
merged 2 commits into from
Jun 27, 2024

Conversation

weiliu1031
Copy link
Contributor

issue: #34095

When a new query node comes online, the segment_checker, channel_checker, and balance_checker simultaneously attempt to allocate segments to it. If this occurs during the execution of a load task and the distribution of the new query node hasn't been updated, the query coordinator may mistakenly view the new query node as empty. As a result, it assigns segments or channels to it, potentially overloading the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query node to prevent assigning an excessive number of segments to it.

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Jun 24, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Jun 24, 2024
Copy link
Contributor

mergify bot commented Jun 24, 2024

@weiliu1031 ut workflow job failed, comment rerun ut can trigger the job again.

@mergify mergify bot added the ci-passed label Jun 24, 2024
@weiliu1031
Copy link
Contributor Author

rerun ut

@mergify mergify bot added ci-passed and removed ci-passed labels Jun 25, 2024
@weiliu1031 weiliu1031 force-pushed the refine_balance_score branch from b8ae1aa to cc7bb0a Compare June 25, 2024 09:26
@mergify mergify bot added the ci-passed label Jun 25, 2024
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 96.55172% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.67%. Comparing base (e04f1f9) to head (6388d23).
Report is 10 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #34096      +/-   ##
==========================================
+ Coverage   80.73%   84.67%   +3.93%     
==========================================
  Files        1076      808     -268     
  Lines      137190   113200   -23990     
==========================================
- Hits       110763    95848   -14915     
+ Misses      22206    13134    -9072     
+ Partials     4221     4218       -3     
Files Coverage Δ
internal/querycoordv2/balance/balance.go 96.96% <100.00%> (ø)
...al/querycoordv2/balance/rowcount_based_balancer.go 97.63% <100.00%> (+0.03%) ⬆️
...ernal/querycoordv2/balance/score_based_balancer.go 98.19% <100.00%> (+0.02%) ⬆️
internal/querycoordv2/task/scheduler.go 90.35% <95.65%> (+5.53%) ⬆️

... and 303 files with indirect coverage changes

When a new query node comes online, the segment_checker, channel_checker, and balance_checker simultaneously attempt to allocate segments to it. If this occurs during the execution of a load task and the distribution of the new query node hasn't been updated, the query coordinator may mistakenly view the new query node as empty. As a result, it assigns segments or channels to it, potentially overloading the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query node to prevent assigning an excessive number of segments to it.

Signed-off-by: Wei Liu <[email protected]>
Signed-off-by: Wei Liu <[email protected]>
@weiliu1031 weiliu1031 force-pushed the refine_balance_score branch from cc7bb0a to 6388d23 Compare June 26, 2024 10:10
@mergify mergify bot added ci-passed and removed ci-passed labels Jun 26, 2024
Copy link
Contributor

@XuanYang-cn XuanYang-cn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@congqixia
Copy link
Contributor

/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia, weiliu1031

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 8123bea into milvus-io:master Jun 27, 2024
12 checks passed
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Jun 27, 2024
…vus-io#34096)

issue: milvus-io#34095

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Jul 1, 2024
) (#34245)

issue: #34095
pr: #34096

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <[email protected]>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this pull request Jul 2, 2024
…vus-io#34096)

issue: milvus-io#34095

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Jul 5, 2024
…vus-io#34096)

issue: milvus-io#34095

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Jul 6, 2024
…vus-io#34096)

issue: milvus-io#34095

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Jul 8, 2024
…vus-io#34096)

issue: milvus-io#34095

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Jul 10, 2024
) (#34461)

issue: #34095
pr: #34096

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants