diff --git a/docs/20-mongodb-atlas/3-get-connection-string.mdx b/docs/20-mongodb-atlas/3-get-connection-string.mdx index dcfafca..4d39f81 100644 --- a/docs/20-mongodb-atlas/3-get-connection-string.mdx +++ b/docs/20-mongodb-atlas/3-get-connection-string.mdx @@ -9,20 +9,18 @@ In the Atlas UI, navigate to the **Overview** page. In the **Clusters section**, -A modal will display several ways to connect to your database. +A modal will display several ways to connect to your database. Select **Drivers**. -Select **Compass**. While we won't be using Compass to import the data, it's an easy way to see your connection string. - - - -Look for your connection string. It should look something like: +Look for your connection string. It should look something like `mongodb+srv://:@/` ``` -mongodb+srv://:@/ + ``` + + Click the copy button next to your connection string to copy it to your clipboard. Paste the connection string somewhere safe. :::tip diff --git a/docs/50-prepare-the-data/1-concepts.mdx b/docs/50-prepare-the-data/1-concepts.mdx deleted file mode 100644 index 48798a4..0000000 --- a/docs/50-prepare-the-data/1-concepts.mdx +++ /dev/null @@ -1,17 +0,0 @@ -# ๐Ÿ“˜ Tools, libraries, and concepts - -## [datasets](https://huggingface.co/docs/datasets/en/index) - -Library used to download a dataset of MongoDB Developer center tutorials from Hugging Face. - -## [RecursiveCharacterTextSplitter](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/split_by_token/) - -A LangChain text splitter that first splits documents by a list of characters and then recursively merges characters into tokens until the specified chunk size is reached. - -## [Sentence Transformers](https://sbert.net/) - -Python library for accessing, using, and training open-source embedding models. - -## [PyMongo](https://pymongo.readthedocs.io/en/stable/) - -Python driver for MongoDB. Used to connect to MongoDB databases, delete and insert documents into a MongoDB collection. \ No newline at end of file diff --git a/docs/50-prepare-the-data/2-load-data.mdx b/docs/50-prepare-the-data/1-load-data.mdx similarity index 100% rename from docs/50-prepare-the-data/2-load-data.mdx rename to docs/50-prepare-the-data/1-load-data.mdx diff --git a/docs/50-prepare-the-data/3-chunk-data.mdx b/docs/50-prepare-the-data/2-chunk-data.mdx similarity index 100% rename from docs/50-prepare-the-data/3-chunk-data.mdx rename to docs/50-prepare-the-data/2-chunk-data.mdx diff --git a/docs/50-prepare-the-data/4-embed-data.mdx b/docs/50-prepare-the-data/3-embed-data.mdx similarity index 100% rename from docs/50-prepare-the-data/4-embed-data.mdx rename to docs/50-prepare-the-data/3-embed-data.mdx diff --git a/docs/50-prepare-the-data/5-ingest-data.mdx b/docs/50-prepare-the-data/4-ingest-data.mdx similarity index 100% rename from docs/50-prepare-the-data/5-ingest-data.mdx rename to docs/50-prepare-the-data/4-ingest-data.mdx diff --git a/docs/60-perform-semantic-search/1-concepts.mdx b/docs/60-perform-semantic-search/1-concepts.mdx index 579de54..c1a4071 100644 --- a/docs/60-perform-semantic-search/1-concepts.mdx +++ b/docs/60-perform-semantic-search/1-concepts.mdx @@ -31,10 +31,10 @@ Vector search in MongoDB takes the form of an aggregation pipeline stage. It alw { "$vectorSearch": { "index": "vector_index", - "path": "embedding", - "filter": {"symbol": "ABMD"}, + "path": "embedding", "queryVector": [0.02421053, -0.022372592,...], - "numCandidates": 150, + "numCandidates": 150, + "filter": {"symbol": "ABMD"}, "limit": 10 } }, diff --git a/docs/60-perform-semantic-search/2-create-vector-index.mdx b/docs/60-perform-semantic-search/2-create-vector-index.mdx index 530b4c9..6c013b5 100644 --- a/docs/60-perform-semantic-search/2-create-vector-index.mdx +++ b/docs/60-perform-semantic-search/2-create-vector-index.mdx @@ -1,35 +1,18 @@ # ๐Ÿ‘ Create a vector search index -To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data. +To retrieve documents from MongoDB using vector search, you must configure a vector search index on the collection into which you ingested your data. The recommended way to do this is via the MongoDB drivers. -To do this, open the **Database Deployments** page in the Atlas UI and select **Create Index** in the lower right corner under Atlas Search. +Fill in any `` placeholders and run the cells under the **Step 7: Create a vector search index** section in the notebook to create a vector search index. - +The answers for code blocks in this section are as follows: -Click the **Create Search Index** button. - - - -Click **JSON Editor** under Atlas Vector Search to create your index - - - - -Select the `mongodb_rag_lab` database and the `knowledge` collection, change the index name to `vector_index`, and add the following index definition in the JSON editor: +**CODE_BLOCK_8** +
+Answer +
```python -{ - "fields": [ - { - "type": "vector", - "path": "embedding", - "numDimensions": 384, - "similarity": "cosine" - } - ] -} +collection.create_search_index(model=model) ``` - -:::info -The number of dimensions in the index definition is 384 since we are using the [gte-small](https://huggingface.co/thenlper/gte-small) model to generate embeddings in this lab. -::: \ No newline at end of file +
+
\ No newline at end of file diff --git a/docs/60-perform-semantic-search/3-vector-search.mdx b/docs/60-perform-semantic-search/3-vector-search.mdx index 805a360..0795b73 100644 --- a/docs/60-perform-semantic-search/3-vector-search.mdx +++ b/docs/60-perform-semantic-search/3-vector-search.mdx @@ -1,12 +1,12 @@ # ๐Ÿ‘ Perform semantic search -Now let's run some vector search queries against our data present in MongoDB. +Now let's run some vector search queries against our data present in MongoDB. Fill in any `` placeholders and run the cells under the **Step 8: Perform semantic search on your data** section in the notebook to run vector search queries against your data. The answers for code blocks in this section are as follows: -**CODE_BLOCK_8** +**CODE_BLOCK_9**
Answer @@ -17,7 +17,7 @@ get_embedding(user_query)
-**CODE_BLOCK_9** +**CODE_BLOCK_10**
Answer @@ -30,7 +30,7 @@ get_embedding(user_query) "queryVector": query_embedding, "path": "embedding", "numCandidates": 150, - "limit": 5, + "limit": 5 } }, { @@ -45,7 +45,7 @@ get_embedding(user_query)
-**CODE_BLOCK_10** +**CODE_BLOCK_11**
Answer diff --git a/docs/60-perform-semantic-search/4-pre-filtering.mdx b/docs/60-perform-semantic-search/4-pre-filtering.mdx index 084b849..6463c7d 100644 --- a/docs/60-perform-semantic-search/4-pre-filtering.mdx +++ b/docs/60-perform-semantic-search/4-pre-filtering.mdx @@ -3,43 +3,39 @@ Pre-filtering is a technique to optimize vector search by only considering documents that match certain criteria during vector search. In this section, you will learn how to combine filters with vector search. This mainly involves: - * Updating the vector search index to include the appropriate filter fields * Updating the `$vectorSearch` stage in the aggregation pipeline definition to include the filters -## Filter for documents where the content type is `Video` +Fill in any `` placeholders and run the cells under the **๐Ÿฆนโ€โ™€๏ธ Combine pre-filtering with vector search** section in the notebook to experiment with combining pre-filters with your vector search queries. -To do this, you will first need to modify the vector search index you created previously. +The answers for code blocks in this section are as follows: -**Updated index definition** +**CODE_BLOCK_12**
Answer
```python { - "fields": [ - { - "type": "vector", - "path": "embedding", - "numDimensions": 384, - "similarity": "cosine" - }, - { - "type":"filter", - "path":"metadata.contentType" + "name": ATLAS_VECTOR_SEARCH_INDEX_NAME, + "type": "vectorSearch", + "definition": { + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + }, + {"type": "filter", "path": "metadata.contentType"} + ] } - ] } ```
-Once you have updated the vector search index, fill in any `` placeholders and run the cells under the **Filter for documents where the content type is Video** section in the notebook to see how the filter impacts the vector search results. - -The answers for code blocks in this section are as follows: - -**CODE_BLOCK_11** +**CODE_BLOCK_13**
Answer @@ -48,7 +44,7 @@ The answers for code blocks in this section are as follows: [ { "$vectorSearch": { - "index": "vector_index", + "index": ATLAS_VECTOR_SEARCH_INDEX_NAME, "path": "embedding", "queryVector": query_embedding, "numCandidates": 150, @@ -68,44 +64,33 @@ The answers for code blocks in this section are as follows:
- -## Filter on documents which have been updated on or after `2024-05-19` and where the content type is `Tutorial` - -Again, you will first need to modify the vector search index. - -**Updated index definition** +**CODE_BLOCK_14**
Answer
```python { - "fields": [ - { - "type": "vector", - "path": "embedding", - "numDimensions": 384, - "similarity": "cosine" - }, - { - "type":"filter", - "path":"metadata.contentType" - }, - { - "type":"filter", - "path":"updated" + "name": ATLAS_VECTOR_SEARCH_INDEX_NAME, + "type": "vectorSearch", + "definition": { + "fields": [ + { + "type": "vector", + "path": "embedding", + "numDimensions": 384, + "similarity": "cosine" + }, + {"type": "filter", "path": "metadata.contentType"}, + {"type": "filter", "path": "updated"} + ] } - ] } ```
-Once you have updated the vector search index, fill in any `` placeholders and run the cells under the **Filter on documents which have been updated on or after 2024-05-19 and where the content type is Tutorial** section in the notebook to see how the filter impacts the vector search results. - -The answers for code blocks in this section are as follows: - -**CODE_BLOCK_12** +**CODE_BLOCK_15**
Answer @@ -114,14 +99,16 @@ The answers for code blocks in this section are as follows: [ { "$vectorSearch": { - "index": "vector_index", + "index": ATLAS_VECTOR_SEARCH_INDEX_NAME, "path": "embedding", "queryVector": query_embedding, "numCandidates": 150, "limit": 5, "filter": { - "metadata.contentType": "Tutorial", - "updated": {"$gte": "2024-05-19"} + "$and": [ + {"metadata.contentType": "Tutorial"}, + {"updated": {"$gte": "2024-05-19"}} + ] } } }, @@ -129,6 +116,7 @@ The answers for code blocks in this section are as follows: "$project": { "_id": 0, "body": 1, + "updated": 1, "score": {"$meta": "vectorSearchScore"} } } diff --git a/docs/70-build-rag-app/1-build-rag-app.mdx b/docs/70-build-rag-app/1-build-rag-app.mdx index bc75df6..8ffe6dd 100644 --- a/docs/70-build-rag-app/1-build-rag-app.mdx +++ b/docs/70-build-rag-app/1-build-rag-app.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **Step 9: The answers for code blocks in this section are as follows: -**CODE_BLOCK_13** +**CODE_BLOCK_16**
Answer @@ -17,7 +17,7 @@ vector_search(user_query)
-**CODE_BLOCK_14** +**CODE_BLOCK_17**
Answer @@ -28,7 +28,7 @@ create_prompt(user_query)
-**CODE_BLOCK_15** +**CODE_BLOCK_18**
Answer diff --git a/docs/70-build-rag-app/2-add-reranking.mdx b/docs/70-build-rag-app/2-add-reranking.mdx index c75eaa2..97a6557 100644 --- a/docs/70-build-rag-app/2-add-reranking.mdx +++ b/docs/70-build-rag-app/2-add-reranking.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **๐Ÿฆนโ€ The answers for code blocks in this section are as follows: -**CODE_BLOCK_16** +**CODE_BLOCK_19**
Answer diff --git a/docs/70-build-rag-app/3-stream-responses.mdx b/docs/70-build-rag-app/3-stream-responses.mdx index f62611e..762ef2e 100644 --- a/docs/70-build-rag-app/3-stream-responses.mdx +++ b/docs/70-build-rag-app/3-stream-responses.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **๐Ÿฆนโ€ The answers for code blocks in this section are as follows: -**CODE_BLOCK_17** +**CODE_BLOCK_20**
Answer @@ -17,7 +17,7 @@ create_prompt(user_query)
-**CODE_BLOCK_18** +**CODE_BLOCK_21**
Answer @@ -32,7 +32,7 @@ fw_client.chat.completions.create(
-**CODE_BLOCK_19** +**CODE_BLOCK_22**
Answer diff --git a/docs/80-add-memory/2-add-memory.mdx b/docs/80-add-memory/1-add-memory.mdx similarity index 96% rename from docs/80-add-memory/2-add-memory.mdx rename to docs/80-add-memory/1-add-memory.mdx index 4ed04fa..2abb4a2 100644 --- a/docs/80-add-memory/2-add-memory.mdx +++ b/docs/80-add-memory/1-add-memory.mdx @@ -6,7 +6,7 @@ Fill in any `` placeholders and run the cells under the **Step 10: The answers for code blocks in this section are as follows: -**CODE_BLOCK_20** +**CODE_BLOCK_23**
Answer @@ -17,7 +17,7 @@ history_collection.create_index("session_id")
-**CODE_BLOCK_21** +**CODE_BLOCK_24**
Answer @@ -28,7 +28,7 @@ history_collection.insert_one(message)
-**CODE_BLOCK_22** +**CODE_BLOCK_25**
Answer @@ -39,7 +39,7 @@ history_collection.find({"session_id": session_id}).sort("timestamp", 1)
-**CODE_BLOCK_23** +**CODE_BLOCK_26**
Answer @@ -50,7 +50,7 @@ retrieve_session_history(session_id)
-**CODE_BLOCK_24** +**CODE_BLOCK_27**
Answer @@ -61,7 +61,7 @@ retrieve_session_history(session_id)
-**CODE_BLOCK_25** +**CODE_BLOCK_28**
Answer diff --git a/docs/80-add-memory/1-concepts.mdx b/docs/80-add-memory/1-concepts.mdx deleted file mode 100644 index 1bbf193..0000000 --- a/docs/80-add-memory/1-concepts.mdx +++ /dev/null @@ -1,9 +0,0 @@ -# ๐Ÿ“˜ Tools, libraries, and concepts - -Memory is important for the LLM to have multi-turn conversations with the user. - -In this lab, you will persist chat messages in a separate MongoDB collection, indexed by session ID. - -For each new user query, you will fetch previous messages for that session from the collection and pass them to the LLM. - -Then once the LLM has generated a response to the query, you will write the query and the LLM's answer to the collection as two separate entries but having the same session ID. \ No newline at end of file diff --git a/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/1-connect-button.png b/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/1-connect-button.png index 759756f..e73a4fa 100644 Binary files a/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/1-connect-button.png and b/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/1-connect-button.png differ diff --git a/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/2-connect-modal.png b/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/2-connect-modal.png index 1e0094f..09614e2 100644 Binary files a/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/2-connect-modal.png and b/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/2-connect-modal.png differ diff --git a/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png b/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png index c1e5299..d4ec577 100644 Binary files a/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png and b/static/img/screenshots/20-mongodb-atlas/3-get-conn-string/3-connect-compass.png differ diff --git a/static/img/screenshots/60-perform-semantic-search/1-create-index.png b/static/img/screenshots/60-perform-semantic-search/1-create-index.png deleted file mode 100644 index 915fcf3..0000000 Binary files a/static/img/screenshots/60-perform-semantic-search/1-create-index.png and /dev/null differ diff --git a/static/img/screenshots/60-perform-semantic-search/2-create-search-index.png b/static/img/screenshots/60-perform-semantic-search/2-create-search-index.png deleted file mode 100644 index 40e2b1e..0000000 Binary files a/static/img/screenshots/60-perform-semantic-search/2-create-search-index.png and /dev/null differ diff --git a/static/img/screenshots/60-perform-semantic-search/3-json-editor.png b/static/img/screenshots/60-perform-semantic-search/3-json-editor.png deleted file mode 100644 index 3132561..0000000 Binary files a/static/img/screenshots/60-perform-semantic-search/3-json-editor.png and /dev/null differ