-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: the performance of partition keys is far inferior to scalar retrieval #38574
Comments
@douglarek That's an interesting test. Thank you for you updates. Quick questions:
/assign @douglarek |
|
@yanliang567 All information has been provided. |
@douglarek thank you for your quick update. could you also provide the birdwatcher backup file for investigation. Please refer to this doc: https://github.com/milvus-io/birdwatcher to backup etcd backup with birdwatcher |
what is scalar search? are you search by partition? or you search with filter? |
and how does the table defined? |
the QPS seems to be too low under whatever test case you are testing |
@douglarek thank you for the update. I checked the logs, but the logs only have INFO logs, so we cannot find any search info in the logs. If convenient, could you please set the log level to debug, reproduce the issue and recollect the logs for investigation? |
Okay, I'll adjust the log to debug this afternoon and collect it again. Additionally, I will also provide information about birdwatcher. Thank you for paying attention to this issue. |
Perhaps it is like this, I will first provide some necessary information. I am not sure if it is my posture that is incorrect or an issue with Milvus, so I reserve judgment on this matter. |
@yanliang567 The following are the provided relevant logs and backup. milvus log(debug enabled): birdwatcher backup: |
@douglarek thank you for the update. unfortunately, it seems that the dirdwatcher backup file is broken or not backup successfully. After I loaded the backup file it show 0 collections and 0 segments. Could you please retry to backup a new file for us? |
@yanliang567 Before I started debugging, I made a backup (bw_etcd_ALL.241224-073220.bak.gz, perhaps this data is also acceptable?) . during debugging, I was too eager and forcibly used kubectl to delete some pods, causing certain nodes in etcd to be in a kill-triggered state. Birdwatcher exported with errors but still generated files; unexpectedly, the files are unusable.
yeah, my mistake. my test program previously tested the HNSW index, and this parameter might not have been deleted. Will this parameter affect IVF_FLAT(yes, the index is it!)? Perhaps I should correct this parameter and retest?
yes, the first one I queried when loading the collection in attu. |
Is there an existing issue for this?
Environment
Current Behavior
First, align a conclusion: whether partition key retrieval improves performance under the premise of scalar fields. If so, is the theoretical performance of partition keys higher than that of purely scalar field retrieval?
According to the test, I found that the retrieval performance of these two is vastly different.
Steps:
Insert data into the following set as shown in the figure
collection schema information:
json
Use a benchmark to call the golang-sdk, the following are key methods for retrieval
The
*expr
here is used to distinguish between scalar retrieval and partition key retrieval.scalar_colors == "red"
for scalar search,scalar_colors_key == "red"
for partition-key search.Search QPS benchmark
milvus 2.3.12
milvus 2.5.0-beta
Expected Behavior
One surprising aspect of milvus-2.5.0 is that the scalar search performance has indeed improved significantly, by about 40%. The gap between partition keys and scalars is too large, which doesn't quite align with the theory in the milvus documentation. Of course, it could also be an issue with my stress testing, which is why I'm raising this issue.
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: