Add support for filtered vector index scan for Milvus integration #1370
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a pretty big PR so here is a high level overview:
Added Features
Index Creation with Extra Columns
When creating a vector index, users can specify what metadata (additional columns) to store for each vector in the database via the INCLUDE keyword. This allows the vector databases to perform filtered similarity search without needing to query EvaDB’s own internal database. Users can then perform predicated search on these included columns with the native support of the specified vector database. A sample EvaQL statement of such a feature would look like this:
Filtered Vector Index Scan
With the index containing the extra metadata, users can add predicates on such metadata when performing similarity search with native vector database support. A similarity search query like the following will be performed on the vector database support level
Code Changes
Any code related to index creation had to be changed to account for the extra metadata to be added in the vector database. This includes changing the query parser, query optimizer, and query executor to account for the extra columns to be added. Below are the following changes to EvaDB’s index creation code:
Additionally, code related to vector index scan optimization and execution had to be changed to account for the WHERE clause predicate in similarity search queries. Below is a list of changes: