Skip to content

Commit

Permalink
Merge pull request #1 from mongodb-developer/auto_index_creation_updates
Browse files Browse the repository at this point in the history
Auto index creation updates
  • Loading branch information
ajosh0504 authored Sep 16, 2024
2 parents a434eb5 + e48aa7a commit 218a08c
Show file tree
Hide file tree
Showing 21 changed files with 74 additions and 131 deletions.
12 changes: 5 additions & 7 deletions docs/20-mongodb-atlas/3-get-connection-string.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,18 @@ In the Atlas UI, navigate to the **Overview** page. In the **Clusters section**,

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/1-connect-button.png" alt="Screenshot of the connect button" />

A modal will display several ways to connect to your database.
A modal will display several ways to connect to your database. Select **Drivers**.

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/2-connect-modal.png" alt="Screenshot of the connect modal" />

Select **Compass**. While we won't be using Compass to import the data, it's an easy way to see your connection string.

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png" alt="Screenshot of the connection string" />

Look for your connection string. It should look something like:
Look for your connection string. It should look something like `mongodb+srv://<username>:<password>@<cluster-url>/`

```
mongodb+srv://<username>:<password>@<cluster-url>/
```

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png" alt="Screenshot of the connection string" />

Click the copy button next to your connection string to copy it to your clipboard. Paste the connection string somewhere safe.

:::tip
Expand Down
17 changes: 0 additions & 17 deletions docs/50-prepare-the-data/1-concepts.mdx

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions docs/60-perform-semantic-search/1-concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@ Vector search in MongoDB takes the form of an aggregation pipeline stage. It alw
{
"$vectorSearch": {
"index": "vector_index",
"path": "embedding",
"filter": {"symbol": "ABMD"},
"path": "embedding",
"queryVector": [0.02421053, -0.022372592,...],
"numCandidates": 150,
"numCandidates": 150,
"filter": {"symbol": "ABMD"},
"limit": 10
}
},
Expand Down
37 changes: 10 additions & 27 deletions docs/60-perform-semantic-search/2-create-vector-index.mdx
Original file line number Diff line number Diff line change
@@ -1,35 +1,18 @@
# 👐 Create a vector search index

To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data.
To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data. The recommended way to do this is via the MongoDB drivers.

To do this, open the **Database Deployments** page in the Atlas UI and select **Create Index** in the lower right corner under Atlas Search.
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 7: Create a vector search index** section in the notebook to create a vector search index.

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/1-create-index.png" alt="Select create index" />
The answers for code blocks in this section are as follows:

Click the **Create Search Index** button.

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/2-create-search-index.png" alt="Create search index" />

Click **JSON Editor** under Atlas Vector Search to create your index

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/3-json-editor.png" alt="The 'Create Index' page with the 'JSON Editor' tab highlighted" />


Select the `mongodb_rag_lab` database and the `knowledge` collection, change the index name to `vector_index`, and add the following index definition in the JSON editor:
**CODE_BLOCK_8**

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
}
]
}
collection.create_search_index(model=model)
```

:::info
The number of dimensions in the index definition is 384 since we are using the [gte-small](https://huggingface.co/thenlper/gte-small) model to generate embeddings in this lab.
:::
</div>
</details>
10 changes: 5 additions & 5 deletions docs/60-perform-semantic-search/3-vector-search.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# 👐 Perform semantic search

Now let's run some vector search queries against our data present in MongoDB.
Now let's run some vector search queries against our data present in MongoDB.

Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8: Perform semantic search on your data** section in the notebook to run vector search queries against your data.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_8**
**CODE_BLOCK_9**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ get_embedding(user_query)
</div>
</details>

**CODE_BLOCK_9**
**CODE_BLOCK_10**

<details>
<summary>Answer</summary>
Expand All @@ -30,7 +30,7 @@ get_embedding(user_query)
"queryVector": query_embedding,
"path": "embedding",
"numCandidates": 150,
"limit": 5,
"limit": 5
}
},
{
Expand All @@ -45,7 +45,7 @@ get_embedding(user_query)
</div>
</details>

**CODE_BLOCK_10**
**CODE_BLOCK_11**

<details>
<summary>Answer</summary>
Expand Down
88 changes: 38 additions & 50 deletions docs/60-perform-semantic-search/4-pre-filtering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,43 +3,39 @@
Pre-filtering is a technique to optimize vector search by only considering documents that match certain criteria during vector search.

In this section, you will learn how to combine filters with vector search. This mainly involves:

* Updating the vector search index to include the appropriate filter fields
* Updating the `$vectorSearch` stage in the aggregation pipeline definition to include the filters

## Filter for documents where the content type is `Video`
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍♀️ Combine pre-filtering with vector search** section in the notebook to experiment with combining pre-filters with your vector search queries.

To do this, you will first need to modify the vector search index you created previously.
The answers for code blocks in this section are as follows:

**Updated index definition**
**CODE_BLOCK_12**

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{
"type":"filter",
"path":"metadata.contentType"
"name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"type": "vectorSearch",
"definition": {
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{"type": "filter", "path": "metadata.contentType"}
]
}
]
}
```
</div>
</details>

Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter for documents where the content type is Video** section in the notebook to see how the filter impacts the vector search results.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_11**
**CODE_BLOCK_13**

<details>
<summary>Answer</summary>
Expand All @@ -48,7 +44,7 @@ The answers for code blocks in this section are as follows:
[
{
"$vectorSearch": {
"index": "vector_index",
"index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": 150,
Expand All @@ -68,44 +64,33 @@ The answers for code blocks in this section are as follows:
</div>
</details>


## Filter on documents which have been updated on or after `2024-05-19` and where the content type is `Tutorial`

Again, you will first need to modify the vector search index.

**Updated index definition**
**CODE_BLOCK_14**

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{
"type":"filter",
"path":"metadata.contentType"
},
{
"type":"filter",
"path":"updated"
"name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"type": "vectorSearch",
"definition": {
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{"type": "filter", "path": "metadata.contentType"},
{"type": "filter", "path": "updated"}
]
}
]
}
```
</div>
</details>

Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter on documents which have been updated on or after 2024-05-19 and where the content type is Tutorial** section in the notebook to see how the filter impacts the vector search results.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_12**
**CODE_BLOCK_15**

<details>
<summary>Answer</summary>
Expand All @@ -114,21 +99,24 @@ The answers for code blocks in this section are as follows:
[
{
"$vectorSearch": {
"index": "vector_index",
"index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": 150,
"limit": 5,
"filter": {
"metadata.contentType": "Tutorial",
"updated": {"$gte": "2024-05-19"}
"$and": [
{"metadata.contentType": "Tutorial"},
{"updated": {"$gte": "2024-05-19"}}
]
}
}
},
{
"$project": {
"_id": 0,
"body": 1,
"updated": 1,
"score": {"$meta": "vectorSearchScore"}
}
}
Expand Down
6 changes: 3 additions & 3 deletions docs/70-build-rag-app/1-build-rag-app.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_13**
**CODE_BLOCK_16**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ vector_search(user_query)
</div>
</details>

**CODE_BLOCK_14**
**CODE_BLOCK_17**

<details>
<summary>Answer</summary>
Expand All @@ -28,7 +28,7 @@ create_prompt(user_query)
</div>
</details>

**CODE_BLOCK_15**
**CODE_BLOCK_18**

<details>
<summary>Answer</summary>
Expand Down
2 changes: 1 addition & 1 deletion docs/70-build-rag-app/2-add-reranking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍

The answers for code blocks in this section are as follows:

**CODE_BLOCK_16**
**CODE_BLOCK_19**

<details>
<summary>Answer</summary>
Expand Down
6 changes: 3 additions & 3 deletions docs/70-build-rag-app/3-stream-responses.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍

The answers for code blocks in this section are as follows:

**CODE_BLOCK_17**
**CODE_BLOCK_20**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ create_prompt(user_query)
</div>
</details>

**CODE_BLOCK_18**
**CODE_BLOCK_21**

<details>
<summary>Answer</summary>
Expand All @@ -32,7 +32,7 @@ fw_client.chat.completions.create(
</div>
</details>

**CODE_BLOCK_19**
**CODE_BLOCK_22**

<details>
<summary>Answer</summary>
Expand Down
Loading

0 comments on commit 218a08c

Please sign in to comment.