Skip to content

Commit

Permalink
Programmatic index creation and related updates
Browse files Browse the repository at this point in the history
  • Loading branch information
ajosh0504 committed Sep 16, 2024
1 parent a434eb5 commit b8af139
Show file tree
Hide file tree
Showing 17 changed files with 69 additions and 124 deletions.
17 changes: 0 additions & 17 deletions docs/50-prepare-the-data/1-concepts.mdx

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions docs/60-perform-semantic-search/1-concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@ Vector search in MongoDB takes the form of an aggregation pipeline stage. It alw
{
"$vectorSearch": {
"index": "vector_index",
"path": "embedding",
"filter": {"symbol": "ABMD"},
"path": "embedding",
"queryVector": [0.02421053, -0.022372592,...],
"numCandidates": 150,
"numCandidates": 150,
"filter": {"symbol": "ABMD"},
"limit": 10
}
},
Expand Down
37 changes: 10 additions & 27 deletions docs/60-perform-semantic-search/2-create-vector-index.mdx
Original file line number Diff line number Diff line change
@@ -1,35 +1,18 @@
# 👐 Create a vector search index

To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data.
To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data. The recommended way to do this is via the MongoDB drivers.

To do this, open the **Database Deployments** page in the Atlas UI and select **Create Index** in the lower right corner under Atlas Search.
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 7: Create a vector search index** section in the notebook to create a vector search index.

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/1-create-index.png" alt="Select create index" />
The answers for code blocks in this section are as follows:

Click the **Create Search Index** button.

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/2-create-search-index.png" alt="Create search index" />

Click **JSON Editor** under Atlas Vector Search to create your index

<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/3-json-editor.png" alt="The 'Create Index' page with the 'JSON Editor' tab highlighted" />


Select the `mongodb_rag_lab` database and the `knowledge` collection, change the index name to `vector_index`, and add the following index definition in the JSON editor:
**CODE_BLOCK_8**

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
}
]
}
collection.create_search_index(model=model)
```

:::info
The number of dimensions in the index definition is 384 since we are using the [gte-small](https://huggingface.co/thenlper/gte-small) model to generate embeddings in this lab.
:::
</div>
</details>
10 changes: 5 additions & 5 deletions docs/60-perform-semantic-search/3-vector-search.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# 👐 Perform semantic search

Now let's run some vector search queries against our data present in MongoDB.
Now let's run some vector search queries against our data present in MongoDB.

Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8: Perform semantic search on your data** section in the notebook to run vector search queries against your data.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_8**
**CODE_BLOCK_9**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ get_embedding(user_query)
</div>
</details>

**CODE_BLOCK_9**
**CODE_BLOCK_10**

<details>
<summary>Answer</summary>
Expand All @@ -30,7 +30,7 @@ get_embedding(user_query)
"queryVector": query_embedding,
"path": "embedding",
"numCandidates": 150,
"limit": 5,
"limit": 5
}
},
{
Expand All @@ -45,7 +45,7 @@ get_embedding(user_query)
</div>
</details>

**CODE_BLOCK_10**
**CODE_BLOCK_11**

<details>
<summary>Answer</summary>
Expand Down
88 changes: 38 additions & 50 deletions docs/60-perform-semantic-search/4-pre-filtering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,43 +3,39 @@
Pre-filtering is a technique to optimize vector search by only considering documents that match certain criteria during vector search.

In this section, you will learn how to combine filters with vector search. This mainly involves:

* Updating the vector search index to include the appropriate filter fields
* Updating the `$vectorSearch` stage in the aggregation pipeline definition to include the filters

## Filter for documents where the content type is `Video`
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍♀️ Combine pre-filtering with vector search** section in the notebook to experiment with combining pre-filters with your vector search queries.

To do this, you will first need to modify the vector search index you created previously.
The answers for code blocks in this section are as follows:

**Updated index definition**
**CODE_BLOCK_12**

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{
"type":"filter",
"path":"metadata.contentType"
"name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"type": "vectorSearch",
"definition": {
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{"type": "filter", "path": "metadata.contentType"}
]
}
]
}
```
</div>
</details>

Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter for documents where the content type is Video** section in the notebook to see how the filter impacts the vector search results.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_11**
**CODE_BLOCK_13**

<details>
<summary>Answer</summary>
Expand All @@ -48,7 +44,7 @@ The answers for code blocks in this section are as follows:
[
{
"$vectorSearch": {
"index": "vector_index",
"index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": 150,
Expand All @@ -68,44 +64,33 @@ The answers for code blocks in this section are as follows:
</div>
</details>


## Filter on documents which have been updated on or after `2024-05-19` and where the content type is `Tutorial`

Again, you will first need to modify the vector search index.

**Updated index definition**
**CODE_BLOCK_14**

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{
"type":"filter",
"path":"metadata.contentType"
},
{
"type":"filter",
"path":"updated"
"name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"type": "vectorSearch",
"definition": {
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{"type": "filter", "path": "metadata.contentType"},
{"type": "filter", "path": "updated"}
]
}
]
}
```
</div>
</details>

Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter on documents which have been updated on or after 2024-05-19 and where the content type is Tutorial** section in the notebook to see how the filter impacts the vector search results.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_12**
**CODE_BLOCK_15**

<details>
<summary>Answer</summary>
Expand All @@ -114,21 +99,24 @@ The answers for code blocks in this section are as follows:
[
{
"$vectorSearch": {
"index": "vector_index",
"index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": 150,
"limit": 5,
"filter": {
"metadata.contentType": "Tutorial",
"updated": {"$gte": "2024-05-19"}
"$and": [
{"metadata.contentType": "Tutorial"},
{"updated": {"$gte": "2024-05-19"}}
]
}
}
},
{
"$project": {
"_id": 0,
"body": 1,
"updated": 1,
"score": {"$meta": "vectorSearchScore"}
}
}
Expand Down
6 changes: 3 additions & 3 deletions docs/70-build-rag-app/1-build-rag-app.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_13**
**CODE_BLOCK_16**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ vector_search(user_query)
</div>
</details>

**CODE_BLOCK_14**
**CODE_BLOCK_17**

<details>
<summary>Answer</summary>
Expand All @@ -28,7 +28,7 @@ create_prompt(user_query)
</div>
</details>

**CODE_BLOCK_15**
**CODE_BLOCK_18**

<details>
<summary>Answer</summary>
Expand Down
2 changes: 1 addition & 1 deletion docs/70-build-rag-app/2-add-reranking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍

The answers for code blocks in this section are as follows:

**CODE_BLOCK_16**
**CODE_BLOCK_19**

<details>
<summary>Answer</summary>
Expand Down
6 changes: 3 additions & 3 deletions docs/70-build-rag-app/3-stream-responses.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍

The answers for code blocks in this section are as follows:

**CODE_BLOCK_17**
**CODE_BLOCK_20**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ create_prompt(user_query)
</div>
</details>

**CODE_BLOCK_18**
**CODE_BLOCK_21**

<details>
<summary>Answer</summary>
Expand All @@ -32,7 +32,7 @@ fw_client.chat.completions.create(
</div>
</details>

**CODE_BLOCK_19**
**CODE_BLOCK_22**

<details>
<summary>Answer</summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 10:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_20**
**CODE_BLOCK_23**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ history_collection.create_index("session_id")
</div>
</details>

**CODE_BLOCK_21**
**CODE_BLOCK_24**

<details>
<summary>Answer</summary>
Expand All @@ -28,7 +28,7 @@ history_collection.insert_one(message)
</div>
</details>

**CODE_BLOCK_22**
**CODE_BLOCK_25**

<details>
<summary>Answer</summary>
Expand All @@ -39,7 +39,7 @@ history_collection.find({"session_id": session_id}).sort("timestamp", 1)
</div>
</details>

**CODE_BLOCK_23**
**CODE_BLOCK_26**

<details>
<summary>Answer</summary>
Expand All @@ -50,7 +50,7 @@ retrieve_session_history(session_id)
</div>
</details>

**CODE_BLOCK_24**
**CODE_BLOCK_27**

<details>
<summary>Answer</summary>
Expand All @@ -61,7 +61,7 @@ retrieve_session_history(session_id)
</div>
</details>

**CODE_BLOCK_25**
**CODE_BLOCK_28**

<details>
<summary>Answer</summary>
Expand Down
Loading

0 comments on commit b8af139

Please sign in to comment.