Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: [2.5] Prevent balancer from overloading the same QueryNode #38724

Merged
merged 2 commits into from
Dec 25, 2024

Conversation

weiliu1031
Copy link
Contributor

issue: #38718
pr: #38719
The balancer calculates the workload of executing tasks as an ongoing score for target nodes. However, a logic issue arises when GetSegmentTaskDelta or GetChannelTaskDelta is called with collectionID=-1, which incorrectly returns zero.

Due to the incorrect global score, the executing task's workload is not properly reflected for each collection. Consequently, each collection submits its own balance task, leading to the balancer assigning excessive tasks to the same QueryNode.

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Dec 24, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Dec 24, 2024
The balancer calculates the workload of executing tasks as an ongoing score for target nodes.
However, a logic issue arises when GetSegmentTaskDelta or GetChannelTaskDelta is called
with collectionID=-1, which incorrectly returns zero.

Due to the incorrect global score, the executing task's workload is not properly reflected
for each collection. Consequently, each collection submits its own balance task,
leading to the balancer assigning excessive tasks to the same QueryNode.

Signed-off-by: Wei Liu <[email protected]>
@weiliu1031 weiliu1031 force-pushed the fix_assign_too_much_task25 branch from 13a8e44 to 08afdfe Compare December 24, 2024 14:21
@weiliu1031 weiliu1031 changed the title fix: Prevent balancer from overloading the same QueryNode fix: [2.5] Prevent balancer from overloading the same QueryNode Dec 24, 2024
Copy link

codecov bot commented Dec 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.05%. Comparing base (9113090) to head (66b5c0e).
Report is 5 commits behind head on 2.5.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              2.5   #38724      +/-   ##
==========================================
- Coverage   81.06%   81.05%   -0.02%     
==========================================
  Files        1381     1381              
  Lines      195103   195130      +27     
==========================================
- Hits       158156   158155       -1     
- Misses      31390    31407      +17     
- Partials     5557     5568      +11     
Components Coverage Δ
Client 78.26% <ø> (ø)
Core 69.33% <ø> (-0.02%) ⬇️
Go 83.03% <100.00%> (-0.02%) ⬇️
Files with missing lines Coverage Δ
internal/querycoordv2/task/scheduler.go 90.12% <100.00%> (+1.08%) ⬆️

... and 30 files with indirect coverage changes

@mergify mergify bot added the ci-passed label Dec 24, 2024
Signed-off-by: Wei Liu <[email protected]>
@mergify mergify bot removed the ci-passed label Dec 25, 2024
Copy link
Contributor

mergify bot commented Dec 25, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@weiliu1031
Copy link
Contributor Author

rerun go-sdk

@mergify mergify bot added the ci-passed label Dec 25, 2024
@tedxu
Copy link
Contributor

tedxu commented Dec 25, 2024

/lgtm
/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tedxu, weiliu1031

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit f441ccd into milvus-io:2.5 Dec 25, 2024
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants