Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Search may return less result after qn recover #36549

Merged

Conversation

weiliu1031
Copy link
Contributor

@weiliu1031 weiliu1031 commented Sep 26, 2024

issue: #36293 #36242
after qn recover, delegator may be loaded in new node, after all segment has been loaded, delegator becomes serviceable. but delegator's target version hasn't been synced, and if search/query comes, delegator will use wrong target version to filter out a empty segment list, which caused empty search result.

This pr will block delegator's serviceable status until target version is synced

@sre-ci-robot sre-ci-robot added the size/M Denotes a PR that changes 30-99 lines. label Sep 26, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Sep 26, 2024
Copy link
Contributor

mergify bot commented Sep 26, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 26, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Sep 26, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 26, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the fix_search_return_less_result branch from 4721bd6 to deffc47 Compare September 27, 2024 02:24
@sre-ci-robot sre-ci-robot added size/L Denotes a PR that changes 100-499 lines. and removed size/M Denotes a PR that changes 30-99 lines. labels Sep 27, 2024
Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the fix_search_return_less_result branch from deffc47 to bfef2b8 Compare September 27, 2024 02:51
Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@weiliu1031
Copy link
Contributor Author

rerun go-sdk

Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the fix_search_return_less_result branch from bfef2b8 to 2e8956b Compare September 27, 2024 06:36
Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the fix_search_return_less_result branch from 2e8956b to 0b5a906 Compare September 27, 2024 06:58
Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Sep 27, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@weiliu1031 weiliu1031 force-pushed the fix_search_return_less_result branch 4 times, most recently from 160a3e3 to 53c1b7e Compare September 30, 2024 08:55
@weiliu1031 weiliu1031 force-pushed the fix_search_return_less_result branch from 91e4ce5 to 04cbc6c Compare November 11, 2024 03:05
@mergify mergify bot added the ci-passed label Nov 11, 2024
Signed-off-by: Wei Liu <[email protected]>
@mergify mergify bot added ci-passed and removed ci-passed labels Nov 11, 2024
@bigsheeper
Copy link
Contributor

/lgtm

@czs007
Copy link
Collaborator

czs007 commented Nov 12, 2024

related: #37364

@czs007
Copy link
Collaborator

czs007 commented Nov 12, 2024

/approve
/lgtm

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: czs007, weiliu1031

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit 266f8ef into milvus-io:master Nov 12, 2024
20 checks passed
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 12, 2024
issue: milvus-io#36293 milvus-io#36242
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 12, 2024
issue: milvus-io#36293 milvus-io#36242
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Nov 12, 2024
issue: #36293 #36242
pr: #36549
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Nov 13, 2024
)

issue: #37640
pr: #37641
fix the pr #36549
cause balance channel will wait until new delegator becomes serviceable,
but new delegator need to sync target version then becomes serviceable,
and sync target version need to be wait all replica load done. so if
increasing replica number and balance channel happens at same time,
logic dead lock occurs.

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Nov 14, 2024
)

issue: #37640
fix the pr #36549
cause balance channel will wait until new delegator becomes serviceable,
but new delegator need to sync target version then becomes serviceable,
and sync target version need to be wait all replica load done. so if
increasing replica number and balance channel happens at same time,
logic dead lock occurs.

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 14, 2024
pr milvus-io#36549 introduce the logic error which update current target when
only parts of channel is ready.

This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Nov 15, 2024
issue: #37679

pr #36549 introduce the logic error which update current target when
only parts of channel is ready.

This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.

Signed-off-by: Wei Liu <[email protected]>
weiliu1031 added a commit to weiliu1031/milvus that referenced this pull request Nov 15, 2024
issue: milvus-io#37679

pr milvus-io#36549 introduce the logic error which update current target when
only parts of channel is ready.

This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.

Signed-off-by: Wei Liu <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Nov 15, 2024
issue: #37679
pr: #37694

pr #36549 introduce the logic error which update current target when
only parts of channel is ready.

This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.

Signed-off-by: Wei Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/test ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug kind/enhancement Issues or changes related to enhancement lgtm sig/testing size/L Denotes a PR that changes 100-499 lines. test/integration integration test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants