Skip to content

Latest commit

 

History

History
102 lines (74 loc) · 4.64 KB

README.md

File metadata and controls

102 lines (74 loc) · 4.64 KB
#Vespa

Vespa sample applications - Simple semantic search

A minimal semantic search application:

minimum-required-vespa-version="8.311.28"

To try this application

Follow Vespa getting started through the vespa deploy step, cloning simple-semantic-search instead of album-recommendation.

Feed documents (this includes embed inference in Vespa):

vespa feed ext/*.json

Example queries using E5-Small-V2 embedding model that maps text to a 384-dimensional vector representation.

vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \
 'input.query(e)=embed(e5, @query)' \
 'query=space contains many suns'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \
 'input.query(e)=embed(e5, @query)' \
 'query=shipping stuff over the sea'
vespa query 'yql=select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))' \
 'input.query(e)=embed(e5, @query)' \
 'query=exchanging information by sound' 

Remove the container after use:

$ docker rm -f vespa

Ready for production

The E5-small-v2 embedding model used in this sample application is suitable for production use and will produce good results in many domains without fine-tuning, especially when combined with text match features.

Model exporting

Transformer-based embedding models have named inputs and outputs that must
be compatible with the input and output names used by the Vespa Bert embedder or the Huggingface embedder.

Huggingface-embedder

See export_hf_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the Vespa huggingface-embedder.

The following exports intfloat/e5-small-v2:

./export_hf_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model

The following exports intfloat/multilingual-e5-small using quantization:

./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize

The following exports intfloat/multilingual-e5-small using quantization and tokenizer patching to workaround this issue with compatiblity problems with loading saved tokenizers:

./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-small --output_dir model --quantize --patch_tokenizer

Bert-embedder

Prefer using the Vespa huggingface-embedder instead.

See export_model_from_hf.py for exporting a Huggingface sentence-transformer model to ONNX format compatible with default input and output names used by the bert-embedder.

The following exports intfloat/e5-small-v2 and saves the model parameters in an ONNX file and the vocab.txt file in the format expected by the Vespa bert-embedder.

./export_model_from_hf.py --hf_model intfloat/e5-small-v2 --output_dir model