This sample application is used to demonstrate multi-vector indexing with Vespa. Multi-vector indexing was introduced in Vespa 8.144.19. Read the blog post announcing multi-vector indexing.
Go to multi-vector-indexing to run this sample application using pyvespa.
The app uses a small sample of Wikipedia articles, where each paragraph is embedded in embedding vector space.
The following is a quick start recipe on how to get started with this application.
- Docker Desktop installed and running. 4 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
- Alternatively, deploy using Vespa Cloud
- Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
- Architecture: x86_64 or arm64
- Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
Validate Docker resource settings, should be minimum 4 GB:
$ docker info | grep "Total Memory" or $ podman info | grep "memTotal"
Install Vespa CLI:
$ brew install vespa-cli
For local deployment using docker image:
$ vespa config set target local
Pull and start the vespa docker container image:
$ docker pull vespaengine/vespa $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \ vespaengine/vespa
Verify that configuration service (deploy api) is ready:
$ vespa status deploy --wait 300
Download this sample application:
$ vespa clone multi-vector-indexing my-app && cd my-app
Deploy the application:
$ vespa deploy --wait 300
It is possible to deploy this app to Vespa Cloud.
Index the Wikipedia articles. This embeds all the paragraphs using the native embedding model, which is computationally expensive for CPU. For production use cases, use Vespa Cloud with GPU instances and autoscaling enabled.
$ zstdcat ext/articles.jsonl.zst | vespa feed -
We demonstrate using vespa cli
, use -v
to see the curl equivalent using HTTP api.
$ vespa query 'yql=select * from wiki where true' \ 'ranking=unranked'
$ vespa query 'yql=select * from wiki where userQuery()' \ 'query=24' \ 'ranking=bm25'
Notice the relevance
, which is assigned by the rank-profile expression.
Also, note that the matched keywords are highlighted in the paragraphs
field.
$ vespa query 'yql=select * from wiki where {targetHits:1}nearestNeighbor(paragraph_embeddings,q)' \ 'input.query(q)=embed(what does 24 mean in the context of railways)' \ 'ranking=semantic'
The closest (best semantic match) paragraph has index 4.
"matchfeatures": {
"closest(paragraph_embeddings)": {"4": 1.0}
}
This index corresponds to the following paragraph:
"In railway timetables 24:00 means the \"end\" of the day. For example, a train due to arrive at a station during the last minute of a day arrives at 24:00; but trains which depart during the first minute of the day go at 00:00."
The tensor presentation format is overridden in this sample application to shorten down the output.
Hybrid combining keyword search on the article level with vector search in the paragraph index:
$ vespa query 'yql=select * from wiki where userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))' \ 'input.query(q)=embed(@query)' \ 'query=what does 24 mean in the context of railways' \ 'ranking=hybrid' \ 'hits=1'
This case combines keyword search with vector (nearestNeighbor) search.
The hybrid
rank-profile also calculates several additional features using
tensor expressions:
firstPhase
is the score of the first ranking phase, configured in the hybrid profile ascos(distance(field, paragraph_embeddings))
.all_paragraph_similarities
returns all the similarity scores for all paragraphs.avg_paragraph_similarity
is the average similarity score across all the paragraphs.
See the hybrid
rank-profile in the schema for details.
The Vespa Tensor Playground is useful to play
with tensor expressions.
These additional features are calculated during second-phase ranking to limit the number of vector computations.
Filtering is also supported, also disable bolding.
$ vespa query 'yql=select * from wiki where url contains "9985" and userQuery() or ({targetHits:1}nearestNeighbor(paragraph_embeddings,q))' \ 'input.query(q)=embed(@query)' \ 'query=what does 24 mean in the context of railways' \ 'ranking=hybrid' \ 'bolding=false'
Tear down the running container:
$ docker rm -f vespa