Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: count* and the num of query_iter have wide gap #33774

Closed
1 task done
lzhin opened this issue Jun 12, 2024 · 15 comments
Closed
1 task done

[Bug]: count* and the num of query_iter have wide gap #33774

lzhin opened this issue Jun 12, 2024 · 15 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@lzhin
Copy link

lzhin commented Jun 12, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.3.5
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus v2.3.7
- OS(Ubuntu or CentOS): Ubuntu 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

with the same expr "448526226933733731<id<448526253755733731",count* and the num of query_iter have wide gap.["{'count(*)': 190357716}"] and the num of query_iter is 117086495 .

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@lzhin lzhin added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 12, 2024
@yanliang567
Copy link
Contributor

@lzhin do you have any duplicated primary keys in the collection? I am asking because count(*) does not do de-dup while query_iter returns results after de-dup.

/assign @lzhin
/unassign

@sre-ci-robot sre-ci-robot assigned lzhin and unassigned yanliang567 Jun 12, 2024
@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 12, 2024
@lzhin
Copy link
Author

lzhin commented Jun 12, 2024

@lzhin do you have any duplicated primary keys in the collection? I am asking because count(*) does not do de-dup while query_iter returns results after de-dup.

/assign @lzhin /unassign

how to check whether has duplicated primary keys? I feel that it might not have duplicated pkeys. because I not set the pkey by myself . the pkey is auto generate by milvus

@yanliang567
Copy link
Contributor

@lzhin Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.
Also if convenient, please retry on latest milvus v2.4.4.

@lzhin
Copy link
Author

lzhin commented Jun 13, 2024

@lzhin Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. Also if convenient, please retry on latest milvus v2.4.4.

I see the export tool use kubectl to export log. but I don't deploy milvus by k8s or operator, I deploy the every moudle use my company's tool , and connect them each other use domain name. so it seems can't export log use export-milvus-log.sh or use docker-compose logs.
I try upgrade milvus to 2.3.13,but fail because when the minio upgrade to 2023-03-20T20-16-18Z, the minio crash. I also try the milvus-backup to backup the collection,but fail when restore the data to the new milvus. it seams that the new minio could not read the backup data. so now I try to use query_iter to export all the collection data to a file and use batch insert to new milvus.

@xiaofan-luan
Copy link
Collaborator

this might be a bug of query iterator and @MrPresent-Han fix that

Copy link

stale bot commented Jul 14, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Jul 14, 2024
@xiaofan-luan
Copy link
Collaborator

for counter issue, recent updates is we saw duplicte data might affect count accuracy

@stale stale bot removed the stale indicates no udpates for 30 days label Jul 16, 2024
@lzhin
Copy link
Author

lzhin commented Jul 16, 2024

for counter issue, recent updates is we saw duplicte data might affect count accuracy

what is the mean of duplicate data, is it mean the same vector? I guess that the counter is larger than the actual vector numbers, it might cause by the crash server, some data might not seal to the minio,but had recorded in etcd.

@MrPresent-Han
Copy link
Contributor

Hi, lzhin, the duplicate data mentioned above is referred to the entities with duplicated primary key, because the result returned by query iterator will exclude results with same pks but the count(*) will not, so this may cause the gap between these two cases

@lzhin
Copy link
Author

lzhin commented Jul 26, 2024

Hi, lzhin, the duplicate data mentioned above is referred to the entities with duplicated primary key, because the result returned by query iterator will exclude results with same pks but the count(*) will not, so this may cause the gap between these two cases

hi, what's the reason that existing duplicated primary key, I think that it should not appear duplicated pk, the pk is automatic generated by milvus

@xiaofan-luan
Copy link
Collaborator

@bigsheeper

can we build a tool for user to analyze duplicted pk existence?

Copy link

stale bot commented Aug 27, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Aug 27, 2024
@xiaofan-luan
Copy link
Collaborator

try 2.3.21 or 2.4.10 see if the problem has been solved

@lzhin
Copy link
Author

lzhin commented Aug 27, 2024

try 2.3.21 or 2.4.10 see if the problem has been solved

the data have been recovered for long, the gap data have been not cared

@stale stale bot removed the stale indicates no udpates for 30 days label Aug 27, 2024
Copy link

stale bot commented Sep 29, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label Sep 29, 2024
@stale stale bot closed this as completed Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants