Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for filtered vector index scan for Milvus integration #1370

Open
wants to merge 13 commits into
base: staging
Choose a base branch
from

Conversation

RichardZhangRZ
Copy link
Contributor

@RichardZhangRZ RichardZhangRZ commented Nov 18, 2023

This is a pretty big PR so here is a high level overview:

Added Features

Index Creation with Extra Columns

When creating a vector index, users can specify what metadata (additional columns) to store for each vector in the database via the INCLUDE keyword. This allows the vector databases to perform filtered similarity search without needing to query EvaDB’s own internal database. Users can then perform predicated search on these included columns with the native support of the specified vector database. A sample EvaQL statement of such a feature would look like this:

CREATE INDEX index1 ON table1(features) INCLUDE (metadata_col1, metadata_col2) USING MILVUS;

Filtered Vector Index Scan

With the index containing the extra metadata, users can add predicates on such metadata when performing similarity search with native vector database support. A similarity search query like the following will be performed on the vector database support level

SELECT data FROM table1 WHERE metadata_col1 >= 3 ORDER BY Similarity(query, features) LIMIT 5;

Code Changes

Any code related to index creation had to be changed to account for the extra metadata to be added in the vector database. This includes changing the query parser, query optimizer, and query executor to account for the extra columns to be added. Below are the following changes to EvaDB’s index creation code:

  • EvaDB’s Lark visitor was changed to read the extra columns to be added in CREATE INDEX statements (columns after the INCLUDE keyword)
  • The create index statement, operator, and plan objects now stores the extra included columns to be stored
  • The create index executor was changed to pass in the data of the included columns to the relevant VectorStore object
  • The MilvusVectorStore was changed to create the collection with the provided metadata provided by the create index executor.
  • EvaDB’s catalog now stores the included columns of a created index

Additionally, code related to vector index scan optimization and execution had to be changed to account for the WHERE clause predicate in similarity search queries. Below is a list of changes:

  • A new optimization rule was added that changed to a filtered vector index search plan the combined use of ORDER BY on the SIMILARITY function with a LIMIT expression and WHERE clause.
  • The vector index scan executor now passes in the EvaQL WHERE predicate into the relevant VectorStore object.
  • The MilvusVectorStore was changed to convert the EvaQL WHERE predicate statement into a predicate string for similarity search that follows Milvus’s Boolean Expression Rules
  • Since hybrid search is only supported on Milvus, there is a check in the optimizer rule that only allows hybrid searches on Milvus indices to be done

@RichardZhangRZ RichardZhangRZ marked this pull request as ready for review November 25, 2023 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant