-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Starling Index to Knowhere #907
Comments
/assign @PwzXxm |
@PwzXxm is the author of Starling, he can help on this |
Hi there, thanks for your interest on contributing to Knowhere. May I ask what is the initiative for adding Starling to Knowhere so I can assist u better? Are you planning to add it to Milvus as well? For adding index to Knowhere alone, you might take a look at this example adding SCANN https://github.com/zilliztech/knowhere/pull/1/files Another feasible proposal might be not adding a new index type to Knowhere, but adding parameters to DiskANN Index. |
Thanks for the information! We will definitely look into those. Our team was experimenting with different vector indices and found Starling. Since Starling was created by Milvus engineers, we felt it would be appropriate to integrate it into Milvus and run experiments in terms of performance, accuracy, and stability. Quick follow-up question: Once an index is added to Knowhere, what does the larger process for registering it in Milvus look like? Is there an analogous pull request like this? Thanks! |
I was wondering what is the use-case and I assume u have already checked out other in-memory indices or DiskANN?
Registering it in Milvus is not a heavy load. |
Hi @PwzXxm, I'm also a part of the team that @aawang1999 is in. The use-case would be for a high-performance index at large levels of scale, the goal being to leverage both the capability of Starling, which has much faster performance to DiskANN due to its optimizations, as well as disk-based scalability. Some questions from looking into the different code repositories:
|
Filtering is approached differently in Milvus, compared to Filtered-DiskANN. In Milvus, the filter condition is evaluated before KNN search, so in Knowhere, the DiskANN only sees a
Indices operate on segment-level and I think it would be fine. Keep in mind that the offset/id on the segment level needs to be preserved, so if you relayout them via Starling, mappings are needed to return the corrected IDs to Milvus. |
Thank you so much for your input! One question for the DiskANN index creation process on Knowhere; I'm somewhat confused as to the distinction between the Build() process and the Deserialize() process: to my understanding, Build() normally takes a dataset as input, for which it creates a (usually in-memory) index and can serialize to disk/object storage, which it can then deserialize from to load into memory. However, since DiskANN is already disk-based, doesn't Build() already accomplish the purpose of deserialization/loading, since it's configured to read the requisite files from disk to create the index anyways? I'm a bit confused as to the difference between the loadings between the two methods, though I assume that it's that after Build() completes, the index structure can clear its in-memory structures, whereas Deserialize() implies it must hold onto those in-memory structures for search? Since Starling has the in-memory graph, I assume that would be loaded in Deserialize() |
Are you referring to anything in addition to the id-page and page-id mappings that Starling does? Going through the code it seems that when loading the graph partition data these mappings are loaded and then used for page search; is this sufficient, or is there anything in addition one needs to do for the Milvus/Knowhere environment? |
The Yes, the loading of the In-Memory Graph would be in |
Yes, it should have already been taken care of. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with |
My development team is trying to add the Starling index to Knowhere. I understand that the process of adding indices is briefly outlined on the Milvus Deep Dive page (linked here), but I was wondering if more detailed instructions could be provided on how to modify the Knowhere code? Assistance would be greatly appreciated.
The text was updated successfully, but these errors were encountered: