vectordb-recipes/tutorials/RAG-with_MatryoshkaEmbed-Llamaindex at main · lancedb/vectordb-recipes

History

Name		Name	Last commit message	Last commit date
parent directory ..
RAG_with_MatryoshkaEmbedding_and_Llamaindex.ipynb		RAG_with_MatryoshkaEmbedding_and_Llamaindex.ipynb
README.md		README.md
graph.png		graph.png
image.png		image.png

README.md

Build RAG with Matryoshka Embeddings and LlamaIndex

Lets Split RAG Pipeline into 5 parts:

Data Loading from URL
Chunking and convert them in Matryoshka Embeddings of different sizes
LanceDB as Vector Store to these Embeddings
Query Engine
Answer Generation using Query Engine

RAG is a technique that retrieves related documents to the user's question, combines them with LLM-base prompt, and sends them to LLMs like GPT to produce more factually accurate generation.

Matryoshka Embeddings

As research progressed, new state-of-the-art text embedding models began producing embeddings with increasingly higher output dimensions. While this enhances performance, it also reduces the efficiency of downstream tasks such as search or classification due to the larger number of values representing each input text.

Image Source: HuggingFace

These Matryoshka embedding models are designed so that even small, truncated embeddings remain useful. In short, Matryoshka embedding models can generate effective embeddings of various dimensions.

Both following models were trained on the AllNLI dataset, a combination of the SNLI and MultiNLI datasets. I evaluated these models on the STSBenchmark test set using multiple embedding dimensions. The results are illustrated in the following figure:

Results:

Top Figure: The Matryoshka model consistently achieves a higher Spearman similarity than the standard model across all dimensions, indicating its superiority in this task.
Second Figure: The Matryoshka model's performance declines much less rapidly than the standard model's. Even at just 8.3% of the full embedding size, the Matryoshka model retains 98.37% of its performance, compared to 96.46% for the standard model.

These findings suggest that truncating embeddings with a Matryoshka model can significantly:

Speed up downstream tasks such as retrieval.
Save on storage space.

All of this is achieved without a notable performance loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG-with_MatryoshkaEmbed-Llamaindex

RAG-with_MatryoshkaEmbed-Llamaindex

README.md

Build RAG with Matryoshka Embeddings and LlamaIndex

Matryoshka Embeddings

Files

RAG-with_MatryoshkaEmbed-Llamaindex

Directory actions

More options

Directory actions

More options

Latest commit

History

RAG-with_MatryoshkaEmbed-Llamaindex

Folders and files

parent directory

README.md

Build RAG with Matryoshka Embeddings and LlamaIndex

Matryoshka Embeddings