Skip to content

Latest commit

 

History

History

multi-vector-indexing

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
#Vespa

Vespa Multi-Vector Indexing with HNSW

This sample application is used to demonstrate multi-vector indexing with Vespa. Multi-vector indexing was introduced in Vespa 8.144.19. Read the blog post announcing multi-vector indexing.

Go to multi-vector-indexing to run this sample application using pyvespa.

The app uses a small sample of Wikipedia articles, where each paragraph is embedded in embedding vector space.

Quick start

The following is a quick start recipe on how to get started with this application.

  • Docker Desktop installed and running. 4 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
  • Alternatively, deploy using Vespa Cloud
  • Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
  • Architecture: x86_64 or arm64
  • Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.

Validate Docker resource settings, should be minimum 4 GB:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment using docker image:

$ vespa config set target local

Pull and start the vespa docker container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa

Verify that configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

Download this sample application:

$ vespa clone multi-vector-indexing my-app && cd my-app

Deploy the application:

$ vespa deploy --wait 300

Deployment note

It is possible to deploy this app to Vespa Cloud.

Indexing sample Wikipedia articles

Index the Wikipedia articles. This embeds all the paragraphs using the native embedding model, which is computationally expensive for CPU. For production use cases, use Vespa Cloud with GPU instances and autoscaling enabled.

$ zstdcat ext/articles.jsonl.zst | vespa feed -

Query and ranking examples

We demonstrate using vespa cli, use -v to see the curl equivalent using HTTP api.

Simple retrieve all articles with undefined ranking

$ vespa query 'yql=select * from wiki where true' \
  'ranking=unranked'

Traditional keyword search with BM25 ranking on the article level

$ vespa query 'yql=select * from wiki where userQuery()' \
  'query=24' \
  'ranking=bm25'

Notice the relevance, which is assigned by the rank-profile expression. Also, note that the matched keywords are highlighted in the paragraphs field.

Semantic vector search on the paragraph level

$ vespa query 'yql=select * from wiki where {targetHits:1}nearestNeighbor(paragraph_embeddings,q)' \
  'input.query(q)=embed(what does 24 mean in the context of railways)' \
  'ranking=semantic'

The closest (best semantic match) paragraph has index 4.

"matchfeatures": {
    "closest(paragraph_embeddings)": {"4": 1.0}
}

This index corresponds to the following paragraph:

"In railway timetables 24:00 means the \"end\" of the day. For example, a train due to arrive at a station during the last minute of a day arrives at 24:00; but trains which depart during the first minute of the day go at 00:00."

The tensor presentation format is overridden in this sample application to shorten down the output.

Hybrid search and ranking

Hybrid combining keyword search on the article level with vector search in the paragraph index:

$ vespa query 'yql=select * from wiki where userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))' \
  'input.query(q)=embed(@query)' \
  'query=what does 24 mean in the context of railways' \
  'ranking=hybrid' \
  'hits=1'

This case combines keyword search with vector (nearestNeighbor) search. The hybrid rank-profile also calculates several additional features using tensor expressions:

  • firstPhase is the score of the first ranking phase, configured in the hybrid profile as cos(distance(field, paragraph_embeddings)).
  • all_paragraph_similarities returns all the similarity scores for all paragraphs.
  • avg_paragraph_similarity is the average similarity score across all the paragraphs.

See the hybrid rank-profile in the schema for details. The Vespa Tensor Playground is useful to play with tensor expressions.

These additional features are calculated during second-phase ranking to limit the number of vector computations.

Hybrid search and filter

Filtering is also supported, also disable bolding.

$ vespa query 'yql=select * from wiki where url contains "9985" and userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))' \
  'input.query(q)=embed(@query)' \
  'query=what does 24 mean in the context of railways' \
  'ranking=hybrid' \
  'bolding=false'

Cleanup

Tear down the running container:

$ docker rm -f vespa