This sample application demonstrates how to efficiently represent three ways of applying Transformer-based ranking models for text ranking in Vespa.
Blog posts with more details:
- Post one: Introduction to neural ranking and the MS Marco passage ranking dataset.
- Post two: Efficient retrievers, sparse, dense, and hybrid retrievers.
- Post three: Re-ranking using multi-representation models (ColBERT).
- Post four: Re-ranking using cross-encoders.
Illustration from ColBERT paper.
This sample application demonstrates:
- Simple single-stage sparse retrieval accelerated by the WAND dynamic pruning algorithm with BM25 ranking.
- Dense (vector) search retrieval for efficient candidate retrieval using Vespa's support for approximate nearest neighbor search. Illustrated in figure a.
- Re-ranking using the Late contextual interaction over BERT (ColBERT) model This method is illustrated in figure d.
- Re-ranking using a cross-encoder with cross attention between the query and document terms. This method is illustrated in figure c.
- Multiphase retrieval and ranking combining efficient retrieval (WAND or ANN) with re-ranking stages.
- Using Vespa embedder functionality.
- Hybrid ranking functionality
There are several ranking profiles defined in the passage document schema. See vespa ranking documentation for an overview of how to represent ranking in Vespa.
Make sure to read and agree to the terms and conditions of MS Marco before downloading the dataset. The following is a quick start recipe for getting started with a tiny slice of the ms marco passage ranking dataset.
Requirements:
- Docker Desktop installed and running. 6 GB available memory for Docker. Refer to Docker memory for details and troubleshooting
- Alternatively, deploy using Vespa Cloud
- Operating system: Linux, macOS, or Windows 10 Pro (Docker requirement)
- Architecture: x86_64 or arm64
- Homebrew to install Vespa CLI, or download a vespa-cli release from GitHub releases.
- python (requests, tqdm, ir_datasets)
Validate Docker resource settings, which should be a minimum of 6 GB:
$ docker info | grep "Total Memory" or $ podman info | grep "memTotal"
Install Vespa CLI:
$ brew install vespa-cli
Install python dependencies for exporting the passage dataset:
$ pip3 install ir_datasets
For local deployment using docker image:
$ vespa config set target local
Pull and start the vespa container image:
$ docker pull vespaengine/vespa $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \ vespaengine/vespa
Verify that the configuration service (deploy API) is ready:
$ vespa status deploy --wait 300
Download this sample application:
$ vespa clone msmarco-ranking myapp && cd myapp
Export the cross-encoder ranker model to onnx format using the Optimum library from HF or download an exported ONNX version of the model (like in this example)
$ mkdir -p models $ curl -L https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2/resolve/main/onnx/model.onnx -o models/model.onnx $ curl -L https://huggingface.co/Xenova/ms-marco-MiniLM-L-6-v2/raw/main/tokenizer.json -o models/tokenizer.json
Deploy the application:
$ vespa deploy --wait 300
Feed a small sample of data:
$ vespa feed ext/docs.jsonl
For example, do a query for what was the Manhattan Project:
Note that the @query
parameter substitution syntax requires Vespa 8.299 or above.
vespa query 'query=what was the manhattan project' \ 'yql=select * from passage where {targetHits: 100}nearestNeighbor(e5, q)'\ 'input.query(q)=embed(e5, @query)' \ 'input.query(qt)=embed(colbert, @query)' \ 'ranking=e5-colbert'
vespa query 'query=what was the manhattan project' \ 'yql=select * from passage where userQuery() or ({targetHits: 100}nearestNeighbor(e5, q))'\ 'input.query(q)=embed(e5, @query)' \ 'input.query(qt)=embed(colbert, @query)' \ 'input.query(query_token_ids)=embed(tokenizer, @query)' \ 'ranking=e5-colbert-cross-encoder-rrf'
$ docker rm -f vespa
With the evaluate_passage_run.py we can run retrieval and ranking using the methods demonstrated.
To do so, we need to index the entire dataset as follows:
ir_datasets export msmarco-passage docs --format jsonl |python3 python/to-vespa-feed.py | vespa feed -
Note that the ir_datasets utility will download MS Marco query evaluation data, so the first run will take some time to complete.
BM25(WAND) Single-phase sparse retrieval
$ ./python/evaluate_passage_run.py --query_split dev --model bm25 --endpoint \ http://localhost:8080/search/
To evaluate ranking effectiveness, download the official MS Marco evaluation script:
$ curl -L -o msmarco_eval.py https://raw.githubusercontent.com/spacemanidol/MSMARCO/master/Ranking/Baselines/msmarco_eval.py
Generate the dev qrels (query relevancy labels) file using the ir_datasets:
$ ./python/dump_passage_dev_qrels.py
Above will write a qrels.dev.small.tsv file to the current directory, now we can evaluate using the run.dev.txt file created by any of the evaluate_passage_run.py runs listed above:
$ python3 msmarco_eval.py qrels.dev.small.tsv run.dev.txt ##################### MRR @10: 0.xx QueriesRanked: 6980 #####################