Skip to content

Commit

Permalink
Updating code blocks
Browse files Browse the repository at this point in the history
  • Loading branch information
ajosh0504 committed Jul 16, 2024
1 parent f15c2a1 commit aa3dd72
Show file tree
Hide file tree
Showing 9 changed files with 74 additions and 47 deletions.
4 changes: 2 additions & 2 deletions docs/50-prepare-the-data/2-load-data.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# 👐 Load the dataset

First, let's download the dataset for our lab. We'll use four RAG-focused blogs from our Developer Center as the source data for our RAG application.
First, let's download the dataset for our lab. We'll use a subset of articles from the MongoDB Developer Center as the source data for our RAG application.

Run all the cells under the **Step 3: Load the dataset** section in the notebook to load the blog content as LangChain Document objects.
Run all the cells under the **Step 3: Load the dataset** section in the notebook to load the articles as a list of Python objects consisting of the content and relevant metadata.
35 changes: 31 additions & 4 deletions docs/50-prepare-the-data/3-chunk-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Since we are working with large documents, we first need to break them up into smaller chunks before embedding and storing them in MongoDB.

Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 4: Chunk up the data** section in the notebook to chunk up the documents we loaded.
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 4: Chunk up the data** section in the notebook to chunk up the articles we loaded.

The answers for code blocks in this section are as follows:

Expand All @@ -13,7 +13,7 @@ The answers for code blocks in this section are as follows:
<div>
```python
RecursiveCharacterTextSplitter.from_tiktoken_encoder(
encoding_name="cl100k_base", chunk_size=200, chunk_overlap=30
encoding_name="cl100k_base", separators=separators, chunk_size=200, chunk_overlap=30
)
```
</div>
Expand All @@ -25,7 +25,7 @@ RecursiveCharacterTextSplitter.from_tiktoken_encoder(
<summary>Answer</summary>
<div>
```python
text_splitter.split_documents(docs)
doc[text_field]
```
</div>
</details>
Expand All @@ -36,7 +36,34 @@ text_splitter.split_documents(docs)
<summary>Answer</summary>
<div>
```python
doc.dict() for doc in split_docs
text_splitter.split_text(text)
```
</div>
</details>

**CODE_BLOCK_6**

<details>
<summary>Answer</summary>
<div>
```python
for chunk in chunks:
temp = doc.copy()
temp[text_field] = chunk
chunked_data.append(temp)
```
</div>
</details>

**CODE_BLOCK_7**

<details>
<summary>Answer</summary>
<div>
```python
for doc in docs:
chunks = get_chunks(doc, "body")
split_docs.extend(chunks)
```
</div>
</details>
10 changes: 5 additions & 5 deletions docs/50-prepare-the-data/4-embed-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

To perform vector search on our data, we need to embed it (i.e. generate embedding vectors) before ingesting it into MongoDB.

Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 5: Generate embeddings** section in the notebook to generate embeddings for the chunked documents.
Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 5: Generate embeddings** section in the notebook to embed the chunked articles.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_6**
**CODE_BLOCK_8**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
</div>
</details>

**CODE_BLOCK_7**
**CODE_BLOCK_9**

<details>
<summary>Answer</summary>
Expand All @@ -29,15 +29,15 @@ return embedding.tolist()
</div>
</details>

**CODE_BLOCK_8**
**CODE_BLOCK_10**

<details>
<summary>Answer</summary>
<div>
```python
for doc in split_docs:
temp = doc.copy()
temp["embedding"] = get_embedding(temp["page_content"])
temp["embedding"] = get_embedding(temp["body"])
embedded_docs.append(temp)
```
</div>
Expand Down
10 changes: 5 additions & 5 deletions docs/50-prepare-the-data/5-ingest-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ import Screenshot from "@site/src/components/Screenshot";

# 👐 Ingest data into MongoDB

The final step to build a MongoDB vector store for our RAG application is to ingest the embedded documents into MongoDB.
The final step to build a MongoDB vector store for our RAG application is to ingest the embedded article chunks into MongoDB.

Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 6: Ingest data into MongoDB** section in the notebook to ingest the embedded documents into MongoDB.

The answers for code blocks in this section are as follows:

**CODE_BLOCK_9**
**CODE_BLOCK_11**

<details>
<summary>Answer</summary>
Expand All @@ -19,7 +19,7 @@ MongoClient(MONGODB_URI)
</div>
</details>

**CODE_BLOCK_10**
**CODE_BLOCK_12**

<details>
<summary>Answer</summary>
Expand All @@ -30,7 +30,7 @@ mongo_client[DB_NAME][COLLECTION_NAME]
</div>
</details>

**CODE_BLOCK_11**
**CODE_BLOCK_13**

<details>
<summary>Answer</summary>
Expand All @@ -41,7 +41,7 @@ collection.delete_many({})
</div>
</details>

**CODE_BLOCK_12**
**CODE_BLOCK_14**

<details>
<summary>Answer</summary>
Expand Down
8 changes: 4 additions & 4 deletions docs/60-perform-semantic-search/3-vector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_13**
**CODE_BLOCK_15**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ get_embedding(user_query)
</div>
</details>

**CODE_BLOCK_14**
**CODE_BLOCK_16**

<details>
<summary>Answer</summary>
Expand All @@ -36,7 +36,7 @@ get_embedding(user_query)
{
"$project": {
"_id": 0,
"page_content": 1,
"body": 1,
"score": {"$meta": "vectorSearchScore"},
}
},
Expand All @@ -45,7 +45,7 @@ get_embedding(user_query)
</div>
</details>

**CODE_BLOCK_15**
**CODE_BLOCK_17**

<details>
<summary>Answer</summary>
Expand Down
24 changes: 12 additions & 12 deletions docs/60-perform-semantic-search/4-pre-filtering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍

The answers for code blocks in this section are as follows:

**CODE_BLOCK_16**
**CODE_BLOCK_18**

<details>
<summary>Answer</summary>
Expand All @@ -25,7 +25,7 @@ The answers for code blocks in this section are as follows:
"type": "vector"
},
{
"path": "metadata.language"
"path": "metadata.contentType",
"type": "filter"
}
]
Expand All @@ -34,7 +34,7 @@ The answers for code blocks in this section are as follows:
</div>
</details>

**CODE_BLOCK_17**
**CODE_BLOCK_19**

<details>
<summary>Answer</summary>
Expand All @@ -48,13 +48,13 @@ The answers for code blocks in this section are as follows:
"path": "embedding",
"numCandidates": 150,
"limit": 5,
"filter": {"metadata.language": "en"}
"filter": {"metadata.contentType": "Video"}
}
},
{
"$project": {
"_id": 0,
"page_content": 1,
"body": 1,
"score": {"$meta": "vectorSearchScore"}
}
}
Expand All @@ -63,7 +63,7 @@ The answers for code blocks in this section are as follows:
</div>
</details>

**CODE_BLOCK_18**
**CODE_BLOCK_20**

<details>
<summary>Answer</summary>
Expand All @@ -78,11 +78,11 @@ The answers for code blocks in this section are as follows:
"type": "vector"
},
{
"path": "metadata.language"
"path": "metadata.contentType",
"type": "filter"
},
{
"path": "type"
"path": "updated",
"type": "filter"
}
]
Expand All @@ -91,7 +91,7 @@ The answers for code blocks in this section are as follows:
</div>
</details>

**CODE_BLOCK_19**
**CODE_BLOCK_21**

<details>
<summary>Answer</summary>
Expand All @@ -107,16 +107,16 @@ The answers for code blocks in this section are as follows:
"limit": 5,
"filter": {
"$and": [
{"metadata.language": "en"},
{"type": "Document"}
{"metadata.contentType": "Video"},
{"updated": {"$gte": "2024-05-20"}}
]
}
}
},
{
"$project": {
"_id": 0,
"page_content": 1,
"body": 1,
"score": {"$meta": "vectorSearchScore"}
}
}
Expand Down
8 changes: 4 additions & 4 deletions docs/70-build-rag-app/2-build-rag-app.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_20**
**CODE_BLOCK_22**

<details>
<summary>Answer</summary>
Expand All @@ -17,18 +17,18 @@ vector_search(user_query)
</div>
</details>

**CODE_BLOCK_21**
**CODE_BLOCK_23**

<details>
<summary>Answer</summary>
<div>
```python
"\n\n".join([d.get("page_content", "") for d in context])
"\n\n".join([d.get("body", "") for d in context])
```
</div>
</details>

**CODE_BLOCK_22**
**CODE_BLOCK_24**

<details>
<summary>Answer</summary>
Expand Down
4 changes: 2 additions & 2 deletions docs/70-build-rag-app/3-stream-responses.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍

The answers for code blocks in this section are as follows:

**CODE_BLOCK_23**
**CODE_BLOCK_25**

<details>
<summary>Answer</summary>
Expand All @@ -27,7 +27,7 @@ fw_client.chat.completions.create(
</div>
</details>

**CODE_BLOCK_24**
**CODE_BLOCK_26**

<details>
<summary>Answer</summary>
Expand Down
Loading

0 comments on commit aa3dd72

Please sign in to comment.