Programmatic index creation and related updates

mongodb-developer · Sep 16, 2024 · b8af139 · b8af139
1 parent a434eb5
commit b8af139
Show file tree

Hide file tree

Showing 17 changed files with 69 additions and 124 deletions.
diff --git a/docs/50-prepare-the-data/1-concepts.mdx b/docs/50-prepare-the-data/1-concepts.mdx
diff --git a/docs/50-prepare-the-data/2-load-data.mdx → docs/50-prepare-the-data/1-load-data.mdx b/docs/50-prepare-the-data/2-load-data.mdx → docs/50-prepare-the-data/1-load-data.mdx
diff --git a/docs/50-prepare-the-data/3-chunk-data.mdx → docs/50-prepare-the-data/2-chunk-data.mdx b/docs/50-prepare-the-data/3-chunk-data.mdx → docs/50-prepare-the-data/2-chunk-data.mdx
diff --git a/docs/50-prepare-the-data/4-embed-data.mdx → docs/50-prepare-the-data/3-embed-data.mdx b/docs/50-prepare-the-data/4-embed-data.mdx → docs/50-prepare-the-data/3-embed-data.mdx
diff --git a/docs/50-prepare-the-data/5-ingest-data.mdx → docs/50-prepare-the-data/4-ingest-data.mdx b/docs/50-prepare-the-data/5-ingest-data.mdx → docs/50-prepare-the-data/4-ingest-data.mdx
diff --git a/docs/60-perform-semantic-search/1-concepts.mdx b/docs/60-perform-semantic-search/1-concepts.mdx
@@ -31,10 +31,10 @@ Vector search in MongoDB takes the form of an aggregation pipeline stage. It alw
   {
     "$vectorSearch": {
       "index": "vector_index", 
-      "path": "embedding", 
-      "filter": {"symbol": "ABMD"}, 
+      "path": "embedding",  
       "queryVector": [0.02421053, -0.022372592,...], 
-      "numCandidates": 150, 
+      "numCandidates": 150,
+      "filter": {"symbol": "ABMD"},
       "limit": 10
     }
   }, 

diff --git a/docs/60-perform-semantic-search/2-create-vector-index.mdx b/docs/60-perform-semantic-search/2-create-vector-index.mdx
@@ -1,35 +1,18 @@
 # 👐 Create a vector search index
 
-To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data.
+To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data. The recommended way to do this is via the MongoDB drivers.
 
-To do this, open the **Database Deployments** page in the Atlas UI and select **Create Index** in the lower right corner under Atlas Search.
+Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 7: Create a vector search index** section in the notebook to create a vector search index.
 
-<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/1-create-index.png" alt="Select create index" />
+The answers for code blocks in this section are as follows:
 
-Click the **Create Search Index** button.
-
-<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/2-create-search-index.png" alt="Create search index" />
-
-Click **JSON Editor** under Atlas Vector Search to create your index
-
-<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/3-json-editor.png" alt="The 'Create Index' page with the 'JSON Editor' tab highlighted" />
-
-
-Select the `mongodb_rag_lab` database and the `knowledge` collection, change the index name to `vector_index`, and add the following index definition in the JSON editor:
+**CODE_BLOCK_8**
 
+<details>
+<summary>Answer</summary>
+<div>
 ```python
-{
-  "fields": [
-    {
-      "type": "vector",
-      "path": "embedding",
-      "numDimensions": 384,
-      "similarity": "cosine"
-    }
-  ]
-}
+collection.create_search_index(model=model)
 ```
-
-:::info
-The number of dimensions in the index definition is 384 since we are using the [gte-small](https://huggingface.co/thenlper/gte-small) model to generate embeddings in this lab.
-:::
+</div>
+</details>
diff --git a/docs/60-perform-semantic-search/3-vector-search.mdx b/docs/60-perform-semantic-search/3-vector-search.mdx
@@ -1,12 +1,12 @@
 # 👐 Perform semantic search
 
-Now let's run some vector search queries against our data present in MongoDB.
+Now let's run some vector search queries against our data present in MongoDB. 
 
 Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8: Perform semantic search on your data** section in the notebook to run vector search queries against your data.
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_8**
+**CODE_BLOCK_9**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ get_embedding(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_9**
+**CODE_BLOCK_10**
 
 <details>
 <summary>Answer</summary>
@@ -30,7 +30,7 @@ get_embedding(user_query)
             "queryVector": query_embedding,
             "path": "embedding",
             "numCandidates": 150,
-            "limit": 5,
+            "limit": 5
         }
     },
     {
@@ -45,7 +45,7 @@ get_embedding(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_10**
+**CODE_BLOCK_11**
 
 <details>
 <summary>Answer</summary>

diff --git a/docs/60-perform-semantic-search/4-pre-filtering.mdx b/docs/60-perform-semantic-search/4-pre-filtering.mdx
@@ -3,43 +3,39 @@
 Pre-filtering is a technique to optimize vector search by only considering documents that match certain criteria during vector search.
 
 In this section, you will learn how to combine filters with vector search. This mainly involves:
-
 * Updating the vector search index to include the appropriate filter fields
 * Updating the `$vectorSearch` stage in the aggregation pipeline definition to include the filters
 
-## Filter for documents where the content type is `Video`
+Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍♀️ Combine pre-filtering with vector search** section in the notebook to experiment with combining pre-filters with your vector search queries.
 
-To do this, you will first need to modify the vector search index you created previously.
+The answers for code blocks in this section are as follows:
 
-**Updated index definition**
+**CODE_BLOCK_12**
 
 <details>
 <summary>Answer</summary>
 <div>
 ```python
 {
-  "fields": [
-    {
-      "type": "vector",
-      "path": "embedding",
-      "numDimensions": 384,
-      "similarity": "cosine"
-    },
-    {
-      "type":"filter",
-      "path":"metadata.contentType"
+    "name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
+    "type": "vectorSearch",
+    "definition": {
+        "fields": [
+            {
+                "type": "vector",
+                "path": "embedding",
+                "numDimensions": 384,
+                "similarity": "cosine"
+            },
+            {"type": "filter", "path": "metadata.contentType"}
+        ]
     }
-  ]
 }
 ```
 </div>
 </details>
 
-Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter for documents where the content type is Video** section in the notebook to see how the filter impacts the vector search results.
-
-The answers for code blocks in this section are as follows:
-
-**CODE_BLOCK_11**
+**CODE_BLOCK_13**
 
 <details>
 <summary>Answer</summary>
@@ -48,7 +44,7 @@ The answers for code blocks in this section are as follows:
 [
     {
         "$vectorSearch": {
-            "index": "vector_index",
+            "index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
             "path": "embedding",
             "queryVector": query_embedding,
             "numCandidates": 150,
@@ -68,44 +64,33 @@ The answers for code blocks in this section are as follows:
 </div>
 </details>
 
-
-## Filter on documents which have been updated on or after `2024-05-19` and where the content type is `Tutorial`
-
-Again, you will first need to modify the vector search index.
-
-**Updated index definition**
+**CODE_BLOCK_14**
 
 <details>
 <summary>Answer</summary>
 <div>
 ```python
 {
-  "fields": [
-    {
-      "type": "vector",
-      "path": "embedding",
-      "numDimensions": 384,
-      "similarity": "cosine"
-    },
-    {
-      "type":"filter",
-      "path":"metadata.contentType"
-    },
-    {
-      "type":"filter",
-      "path":"updated"
+    "name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
+    "type": "vectorSearch",
+    "definition": {
+        "fields": [
+            {
+                "type": "vector",
+                "path": "embedding",
+                "numDimensions": 384,
+                "similarity": "cosine"
+            },
+            {"type": "filter", "path": "metadata.contentType"},
+            {"type": "filter", "path": "updated"}
+        ]
     }
-  ]
 }
 ```
 </div>
 </details>
 
-Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter on documents which have been updated on or after 2024-05-19 and where the content type is Tutorial** section in the notebook to see how the filter impacts the vector search results.
-
-The answers for code blocks in this section are as follows:
-
-**CODE_BLOCK_12**
+**CODE_BLOCK_15**
 
 <details>
 <summary>Answer</summary>
@@ -114,21 +99,24 @@ The answers for code blocks in this section are as follows:
 [
     {
         "$vectorSearch": {
-            "index": "vector_index",
+            "index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
             "path": "embedding",
             "queryVector": query_embedding,
             "numCandidates": 150,
             "limit": 5,
             "filter": {
-                "metadata.contentType": "Tutorial",
-                "updated": {"$gte": "2024-05-19"}
+                "$and": [
+                    {"metadata.contentType": "Tutorial"},
+                    {"updated": {"$gte": "2024-05-19"}}
+                ]
             }
         }
     },
     {
         "$project": {
             "_id": 0,
             "body": 1,
+            "updated": 1,
             "score": {"$meta": "vectorSearchScore"}
         }
     }

diff --git a/docs/70-build-rag-app/1-build-rag-app.mdx b/docs/70-build-rag-app/1-build-rag-app.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_13**
+**CODE_BLOCK_16**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ vector_search(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_14**
+**CODE_BLOCK_17**
 
 <details>
 <summary>Answer</summary>
@@ -28,7 +28,7 @@ create_prompt(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_15**
+**CODE_BLOCK_18**
 
 <details>
 <summary>Answer</summary>

diff --git a/docs/70-build-rag-app/2-add-reranking.mdx b/docs/70-build-rag-app/2-add-reranking.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_16**
+**CODE_BLOCK_19**
 
 <details>
 <summary>Answer</summary>

diff --git a/docs/70-build-rag-app/3-stream-responses.mdx b/docs/70-build-rag-app/3-stream-responses.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_17**
+**CODE_BLOCK_20**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ create_prompt(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_18**
+**CODE_BLOCK_21**
 
 <details>
 <summary>Answer</summary>
@@ -32,7 +32,7 @@ fw_client.chat.completions.create(
 </div>
 </details>
 
-**CODE_BLOCK_19**
+**CODE_BLOCK_22**
 
 <details>
 <summary>Answer</summary>

diff --git a/docs/80-add-memory/2-add-memory.mdx → docs/80-add-memory/1-add-memory.mdx b/docs/80-add-memory/2-add-memory.mdx → docs/80-add-memory/1-add-memory.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 10:
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_20**
+**CODE_BLOCK_23**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ history_collection.create_index("session_id")
 </div>
 </details>
 
-**CODE_BLOCK_21**
+**CODE_BLOCK_24**
 
 <details>
 <summary>Answer</summary>
@@ -28,7 +28,7 @@ history_collection.insert_one(message)
 </div>
 </details>
 
-**CODE_BLOCK_22**
+**CODE_BLOCK_25**
 
 <details>
 <summary>Answer</summary>
@@ -39,7 +39,7 @@ history_collection.find({"session_id": session_id}).sort("timestamp", 1)
 </div>
 </details>
 
-**CODE_BLOCK_23**
+**CODE_BLOCK_26**
 
 <details>
 <summary>Answer</summary>
@@ -50,7 +50,7 @@ retrieve_session_history(session_id)
 </div>
 </details>
 
-**CODE_BLOCK_24**
+**CODE_BLOCK_27**
 
 <details>
 <summary>Answer</summary>
@@ -61,7 +61,7 @@ retrieve_session_history(session_id)
 </div>
 </details>
 
-**CODE_BLOCK_25**
+**CODE_BLOCK_28**
 
 <details>
 <summary>Answer</summary>