Skip to content

Commit

Permalink
docs[patch]: Expand embeddings docs (#22881)
Browse files Browse the repository at this point in the history
  • Loading branch information
jacoblee93 authored Jun 14, 2024
1 parent 73c76b9 commit 8e89178
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions docs/docs/concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -425,9 +425,14 @@ For specifics on how to use text splitters, see the [relevant how-to guides here
### Embedding models
<span data-heading-keywords="embedding,embeddings"></span>

The Embeddings class is a class designed for interfacing with text embedding models. There are lots of embedding model providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.
Embedding models create a vector representation of a piece of text. You can think of a vector as an array of numbers that captures the semantic meaning of the text.
By representing the text in this way, you can perform mathematical operations that allow you to do things like search for other pieces of text that are most similar in meaning.
These natural language search capabilities underpin many types of [context retrieval](/docs/concepts/#retrieval),
where we provide an LLM with the relevant data it needs to effectively respond to a query.

Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.
![](/img/embeddings.png)

The `Embeddings` class is a class designed for interfacing with text embedding models. There are many different embedding model providers (OpenAI, Cohere, Hugging Face, etc) and local models, and this class is designed to provide a standard interface for all of them.

The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).

Expand All @@ -440,6 +445,9 @@ One of the most common ways to store and search over unstructured data is to emb
and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
A vector store takes care of storing embedded data and performing vector search for you.

Most vector stores can also store metadata about embedded vectors and support filtering on that metadata before
similarity search, allowing you more control over returned documents.

Vector stores can be converted to the retriever interface by doing:

```python
Expand Down
Binary file added docs/static/img/embeddings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8e89178

Please sign in to comment.