Merge pull request #1 from mongodb-developer/auto_index_creation_updates

Auto index creation updates
mongodb-developer · Sep 16, 2024 · 218a08c · 218a08c
2 parents a434eb5 + e48aa7a
commit 218a08c
Show file tree

Hide file tree

Showing 21 changed files with 74 additions and 131 deletions.
diff --git a/docs/20-mongodb-atlas/3-get-connection-string.mdx b/docs/20-mongodb-atlas/3-get-connection-string.mdx
@@ -9,20 +9,18 @@ In the Atlas UI, navigate to the **Overview** page. In the **Clusters section**,
 
 <Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/1-connect-button.png" alt="Screenshot of the connect button" />
 
-A modal will display several ways to connect to your database.
+A modal will display several ways to connect to your database. Select **Drivers**.
 
 <Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/2-connect-modal.png" alt="Screenshot of the connect modal" />
 
-Select **Compass**. While we won't be using Compass to import the data, it's an easy way to see your connection string.
-
-<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png" alt="Screenshot of the connection string" />
-
-Look for your connection string. It should look something like:
+Look for your connection string. It should look something like `mongodb+srv://<username>:<password>@<cluster-url>/`
 
 ```
-mongodb+srv://<username>:<password>@<cluster-url>/
+
 ```
 
+<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png" alt="Screenshot of the connection string" />
+
 Click the copy button next to your connection string to copy it to your clipboard. Paste the connection string somewhere safe.
 
 :::tip

diff --git a/docs/50-prepare-the-data/1-concepts.mdx b/docs/50-prepare-the-data/1-concepts.mdx
diff --git a/docs/50-prepare-the-data/2-load-data.mdx → docs/50-prepare-the-data/1-load-data.mdx b/docs/50-prepare-the-data/2-load-data.mdx → docs/50-prepare-the-data/1-load-data.mdx
diff --git a/docs/50-prepare-the-data/3-chunk-data.mdx → docs/50-prepare-the-data/2-chunk-data.mdx b/docs/50-prepare-the-data/3-chunk-data.mdx → docs/50-prepare-the-data/2-chunk-data.mdx
diff --git a/docs/50-prepare-the-data/4-embed-data.mdx → docs/50-prepare-the-data/3-embed-data.mdx b/docs/50-prepare-the-data/4-embed-data.mdx → docs/50-prepare-the-data/3-embed-data.mdx
diff --git a/docs/50-prepare-the-data/5-ingest-data.mdx → docs/50-prepare-the-data/4-ingest-data.mdx b/docs/50-prepare-the-data/5-ingest-data.mdx → docs/50-prepare-the-data/4-ingest-data.mdx
diff --git a/docs/60-perform-semantic-search/1-concepts.mdx b/docs/60-perform-semantic-search/1-concepts.mdx
@@ -31,10 +31,10 @@ Vector search in MongoDB takes the form of an aggregation pipeline stage. It alw
   {
     "$vectorSearch": {
       "index": "vector_index", 
-      "path": "embedding", 
-      "filter": {"symbol": "ABMD"}, 
+      "path": "embedding",  
       "queryVector": [0.02421053, -0.022372592,...], 
-      "numCandidates": 150, 
+      "numCandidates": 150,
+      "filter": {"symbol": "ABMD"},
       "limit": 10
     }
   }, 

diff --git a/docs/60-perform-semantic-search/2-create-vector-index.mdx b/docs/60-perform-semantic-search/2-create-vector-index.mdx
@@ -1,35 +1,18 @@
 # 👐 Create a vector search index
 
-To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data.
+To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data. The recommended way to do this is via the MongoDB drivers.
 
-To do this, open the **Database Deployments** page in the Atlas UI and select **Create Index** in the lower right corner under Atlas Search.
+Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 7: Create a vector search index** section in the notebook to create a vector search index.
 
-<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/1-create-index.png" alt="Select create index" />
+The answers for code blocks in this section are as follows:
 
-Click the **Create Search Index** button.
-
-<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/2-create-search-index.png" alt="Create search index" />
-
-Click **JSON Editor** under Atlas Vector Search to create your index
-
-<Screenshot url="https://cloud.mongodb.com" src="img/screenshots/60-perform-semantic-search/3-json-editor.png" alt="The 'Create Index' page with the 'JSON Editor' tab highlighted" />
-
-
-Select the `mongodb_rag_lab` database and the `knowledge` collection, change the index name to `vector_index`, and add the following index definition in the JSON editor:
+**CODE_BLOCK_8**
 
+<details>
+<summary>Answer</summary>
+<div>
 ```python
-{
-  "fields": [
-    {
-      "type": "vector",
-      "path": "embedding",
-      "numDimensions": 384,
-      "similarity": "cosine"
-    }
-  ]
-}
+collection.create_search_index(model=model)
 ```
-
-:::info
-The number of dimensions in the index definition is 384 since we are using the [gte-small](https://huggingface.co/thenlper/gte-small) model to generate embeddings in this lab.
-:::
+</div>
+</details>
diff --git a/docs/60-perform-semantic-search/3-vector-search.mdx b/docs/60-perform-semantic-search/3-vector-search.mdx
@@ -1,12 +1,12 @@
 # 👐 Perform semantic search
 
-Now let's run some vector search queries against our data present in MongoDB.
+Now let's run some vector search queries against our data present in MongoDB. 
 
 Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8: Perform semantic search on your data** section in the notebook to run vector search queries against your data.
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_8**
+**CODE_BLOCK_9**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ get_embedding(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_9**
+**CODE_BLOCK_10**
 
 <details>
 <summary>Answer</summary>
@@ -30,7 +30,7 @@ get_embedding(user_query)
             "queryVector": query_embedding,
             "path": "embedding",
             "numCandidates": 150,
-            "limit": 5,
+            "limit": 5
         }
     },
     {
@@ -45,7 +45,7 @@ get_embedding(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_10**
+**CODE_BLOCK_11**
 
 <details>
 <summary>Answer</summary>

diff --git a/docs/60-perform-semantic-search/4-pre-filtering.mdx b/docs/60-perform-semantic-search/4-pre-filtering.mdx
@@ -3,43 +3,39 @@
 Pre-filtering is a technique to optimize vector search by only considering documents that match certain criteria during vector search.
 
 In this section, you will learn how to combine filters with vector search. This mainly involves:
-
 * Updating the vector search index to include the appropriate filter fields
 * Updating the `$vectorSearch` stage in the aggregation pipeline definition to include the filters
 
-## Filter for documents where the content type is `Video`
+Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍♀️ Combine pre-filtering with vector search** section in the notebook to experiment with combining pre-filters with your vector search queries.
 
-To do this, you will first need to modify the vector search index you created previously.
+The answers for code blocks in this section are as follows:
 
-**Updated index definition**
+**CODE_BLOCK_12**
 
 <details>
 <summary>Answer</summary>
 <div>
 ```python
 {
-  "fields": [
-    {
-      "type": "vector",
-      "path": "embedding",
-      "numDimensions": 384,
-      "similarity": "cosine"
-    },
-    {
-      "type":"filter",
-      "path":"metadata.contentType"
+    "name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
+    "type": "vectorSearch",
+    "definition": {
+        "fields": [
+            {
+                "type": "vector",
+                "path": "embedding",
+                "numDimensions": 384,
+                "similarity": "cosine"
+            },
+            {"type": "filter", "path": "metadata.contentType"}
+        ]
     }
-  ]
 }
 ```
 </div>
 </details>
 
-Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter for documents where the content type is Video** section in the notebook to see how the filter impacts the vector search results.
-
-The answers for code blocks in this section are as follows:
-
-**CODE_BLOCK_11**
+**CODE_BLOCK_13**
 
 <details>
 <summary>Answer</summary>
@@ -48,7 +44,7 @@ The answers for code blocks in this section are as follows:
 [
     {
         "$vectorSearch": {
-            "index": "vector_index",
+            "index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
             "path": "embedding",
             "queryVector": query_embedding,
             "numCandidates": 150,
@@ -68,44 +64,33 @@ The answers for code blocks in this section are as follows:
 </div>
 </details>
 
-
-## Filter on documents which have been updated on or after `2024-05-19` and where the content type is `Tutorial`
-
-Again, you will first need to modify the vector search index.
-
-**Updated index definition**
+**CODE_BLOCK_14**
 
 <details>
 <summary>Answer</summary>
 <div>
 ```python
 {
-  "fields": [
-    {
-      "type": "vector",
-      "path": "embedding",
-      "numDimensions": 384,
-      "similarity": "cosine"
-    },
-    {
-      "type":"filter",
-      "path":"metadata.contentType"
-    },
-    {
-      "type":"filter",
-      "path":"updated"
+    "name": ATLAS_VECTOR_SEARCH_INDEX_NAME,
+    "type": "vectorSearch",
+    "definition": {
+        "fields": [
+            {
+                "type": "vector",
+                "path": "embedding",
+                "numDimensions": 384,
+                "similarity": "cosine"
+            },
+            {"type": "filter", "path": "metadata.contentType"},
+            {"type": "filter", "path": "updated"}
+        ]
     }
-  ]
 }
 ```
 </div>
 </details>
 
-Once you have updated the vector search index, fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Filter on documents which have been updated on or after 2024-05-19 and where the content type is Tutorial** section in the notebook to see how the filter impacts the vector search results.
-
-The answers for code blocks in this section are as follows:
-
-**CODE_BLOCK_12**
+**CODE_BLOCK_15**
 
 <details>
 <summary>Answer</summary>
@@ -114,21 +99,24 @@ The answers for code blocks in this section are as follows:
 [
     {
         "$vectorSearch": {
-            "index": "vector_index",
+            "index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
             "path": "embedding",
             "queryVector": query_embedding,
             "numCandidates": 150,
             "limit": 5,
             "filter": {
-                "metadata.contentType": "Tutorial",
-                "updated": {"$gte": "2024-05-19"}
+                "$and": [
+                    {"metadata.contentType": "Tutorial"},
+                    {"updated": {"$gte": "2024-05-19"}}
+                ]
             }
         }
     },
     {
         "$project": {
             "_id": 0,
             "body": 1,
+            "updated": 1,
             "score": {"$meta": "vectorSearchScore"}
         }
     }

diff --git a/docs/70-build-rag-app/1-build-rag-app.mdx b/docs/70-build-rag-app/1-build-rag-app.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_13**
+**CODE_BLOCK_16**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ vector_search(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_14**
+**CODE_BLOCK_17**
 
 <details>
 <summary>Answer</summary>
@@ -28,7 +28,7 @@ create_prompt(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_15**
+**CODE_BLOCK_18**
 
 <details>
 <summary>Answer</summary>

diff --git a/docs/70-build-rag-app/2-add-reranking.mdx b/docs/70-build-rag-app/2-add-reranking.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_16**
+**CODE_BLOCK_19**
 
 <details>
 <summary>Answer</summary>

diff --git a/docs/70-build-rag-app/3-stream-responses.mdx b/docs/70-build-rag-app/3-stream-responses.mdx
@@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍
 
 The answers for code blocks in this section are as follows:
 
-**CODE_BLOCK_17**
+**CODE_BLOCK_20**
 
 <details>
 <summary>Answer</summary>
@@ -17,7 +17,7 @@ create_prompt(user_query)
 </div>
 </details>
 
-**CODE_BLOCK_18**
+**CODE_BLOCK_21**
 
 <details>
 <summary>Answer</summary>
@@ -32,7 +32,7 @@ fw_client.chat.completions.create(
 </div>
 </details>
 
-**CODE_BLOCK_19**
+**CODE_BLOCK_22**
 
 <details>
 <summary>Answer</summary>