Skip to content

Commit

Permalink
Updating embedding section
Browse files Browse the repository at this point in the history
  • Loading branch information
ajosh0504 committed Aug 1, 2024
1 parent eb5468d commit 5cc69d4
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 5 deletions.
10 changes: 7 additions & 3 deletions docs/50-prepare-the-data/4-embed-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The answers for code blocks in this section are as follows:
<summary>Answer</summary>
<div>
```python
SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
SentenceTransformer("thenlper/gte-small")
```
</div>
</details>
Expand All @@ -35,9 +35,13 @@ return embedding.tolist()
<summary>Answer</summary>
<div>
```python
for doc in split_docs:
for doc in tqdm(split_docs):
doc["embedding"] = get_embedding(doc["body"])
embedded_docs.append(doc)
```
</div>
</details>
</details>

:::caution
If the embedding generation is taking too long (> 2-3 min), kill/interrupt the cell and move on to the next step with the documents that have been embedded up until that point.
:::
4 changes: 2 additions & 2 deletions docs/60-perform-semantic-search/2-create-vector-index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ Select the `mongodb_rag_lab` database and the `knowledge` collection, change the
{
"type": "vector",
"path": "embedding",
"numDimensions": 1024,
"numDimensions": 384,
"similarity": "cosine"
}
]
}
```

:::info
The number of dimensions in the index definition is 1024 since we are using Mixedbread AI's open-source [mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) model to generate embeddings in this lab.
The number of dimensions in the index definition is 384 since we are using the [gte-small](https://huggingface.co/thenlper/gte-small) model to generate embeddings in this lab.
:::

0 comments on commit 5cc69d4

Please sign in to comment.