Skip to content

Commit

Permalink
Updating to reflect changed code blocks and sections
Browse files Browse the repository at this point in the history
  • Loading branch information
ajosh0504 committed Aug 14, 2024
1 parent 93b2eb2 commit 605f46c
Show file tree
Hide file tree
Showing 10 changed files with 96 additions and 187 deletions.
2 changes: 1 addition & 1 deletion docs/40-dev-env/2-setup-pre-reqs.mdx
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# 👐 Setup prerequisites

Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 1: Install libraries** and **Step 2: Setup prerequisites** sections in the notebook.
Replace any placeholders and run the cells under the **Step 1: Install libraries** and **Step 2: Setup prerequisites** sections in the notebook.
35 changes: 4 additions & 31 deletions docs/50-prepare-the-data/3-chunk-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 4:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_3**
**CODE_BLOCK_1**

<details>
<summary>Answer</summary>
Expand All @@ -19,18 +19,7 @@ RecursiveCharacterTextSplitter.from_tiktoken_encoder(
</div>
</details>

**CODE_BLOCK_4**

<details>
<summary>Answer</summary>
<div>
```python
doc[text_field]
```
</div>
</details>

**CODE_BLOCK_5**
**CODE_BLOCK_2**

<details>
<summary>Answer</summary>
Expand All @@ -41,29 +30,13 @@ text_splitter.split_text(text)
</div>
</details>

**CODE_BLOCK_6**

<details>
<summary>Answer</summary>
<div>
```python
for chunk in chunks:
temp = doc.copy()
temp[text_field] = chunk
chunked_data.append(temp)
```
</div>
</details>

**CODE_BLOCK_7**
**CODE_BLOCK_3**

<details>
<summary>Answer</summary>
<div>
```python
for doc in docs:
chunks = get_chunks(doc, "body")
split_docs.extend(chunks)
get_chunks(doc, "body")
```
</div>
</details>
22 changes: 4 additions & 18 deletions docs/50-prepare-the-data/4-embed-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,38 +6,24 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 5:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_8**
**CODE_BLOCK_4**

<details>
<summary>Answer</summary>
<div>
```python
SentenceTransformer("thenlper/gte-small")
embedding_model.encode(text)
```
</div>
</details>

**CODE_BLOCK_9**
**CODE_BLOCK_5**

<details>
<summary>Answer</summary>
<div>
```python
embedding = embedding_model.encode(text)
return embedding.tolist()
```
</div>
</details>

**CODE_BLOCK_10**

<details>
<summary>Answer</summary>
<div>
```python
for doc in tqdm(split_docs):
doc["embedding"] = get_embedding(doc["body"])
embedded_docs.append(doc)
doc["embedding"] = get_embedding(doc["body"])
```
</div>
</details>
Expand Down
15 changes: 2 additions & 13 deletions docs/50-prepare-the-data/5-ingest-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 6:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_11**
**CODE_BLOCK_6**

<details>
<summary>Answer</summary>
Expand All @@ -19,18 +19,7 @@ mongodb_client[DB_NAME][COLLECTION_NAME]
</div>
</details>

**CODE_BLOCK_12**

<details>
<summary>Answer</summary>
<div>
```python
collection.delete_many({})
```
</div>
</details>

**CODE_BLOCK_13**
**CODE_BLOCK_7**

<details>
<summary>Answer</summary>
Expand Down
10 changes: 5 additions & 5 deletions docs/60-perform-semantic-search/3-vector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 8:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_14**
**CODE_BLOCK_8**

<details>
<summary>Answer</summary>
Expand All @@ -17,7 +17,7 @@ get_embedding(user_query)
</div>
</details>

**CODE_BLOCK_15**
**CODE_BLOCK_9**

<details>
<summary>Answer</summary>
Expand All @@ -37,15 +37,15 @@ get_embedding(user_query)
"$project": {
"_id": 0,
"body": 1,
"score": {"$meta": "vectorSearchScore"},
"score": {"$meta": "vectorSearchScore"}
}
},
}
]
```
</div>
</details>

**CODE_BLOCK_16**
**CODE_BLOCK_10**

<details>
<summary>Answer</summary>
Expand Down
97 changes: 50 additions & 47 deletions docs/60-perform-semantic-search/4-pre-filtering.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,37 @@

Pre-filtering a technique to optimize vector search by only considering documents that match certain criteria during vector search.

Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **🦹‍♀️ Combine pre-filtering with vector search** section in the notebook to get a sense of how to combine pre-filtering with MongoDB Atlas Vector Search.
## Filter for documents where the content type is `Video`

:::caution
**DO NOT** actually modify the existing vector index definitions in the Atlas UI, or the existing pipeline definitions in the code.
:::

The answers for code blocks in this section are as follows:

**CODE_BLOCK_17**
To do this, you will first need to modify the vector search index you created previously. The new index definition should look as follows:

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
},
{
"path": "metadata.contentType",
"type": "filter"
}
]
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{
"type":"filter",
"path":"metadata.contentType"
}
]
}
```
</div>
</details>

**CODE_BLOCK_18**
Once you have updated the vector search index, fill in `<CODE_BLOCK_11>` and run the cells under the **Filter for documents where the content type is Video** section in the notebook to see how the filter impacts the vector search results.

The answer for this code block is as follows:

**CODE_BLOCK_11**

<details>
<summary>Answer</summary>
Expand All @@ -43,9 +41,9 @@ The answers for code blocks in this section are as follows:
[
{
"$vectorSearch": {
"index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"queryVector": query_embedding,
"index": "vector_index",
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": 150,
"limit": 5,
"filter": {"metadata.contentType": "Video"}
Expand All @@ -63,35 +61,42 @@ The answers for code blocks in this section are as follows:
</div>
</details>

**CODE_BLOCK_19**

## Filter on documents which have been updated on or after `2024-05-19` and where the content type is `Tutorial`

Again, you will need to modify the vector search index. The new index definition should look as follows:

<details>
<summary>Answer</summary>
<div>
```python
{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
},
{
"path": "metadata.contentType",
"type": "filter"
},
{
"path": "updated",
"type": "filter"
}
]
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{
"type":"filter",
"path":"metadata.contentType"
},
{
"type":"filter",
"path":"updated"
}
]
}
```
</div>
</details>

**CODE_BLOCK_20**
Once you have updated the vector search index, fill in `<CODE_BLOCK_12>` and run the cells under the **Filter on documents which have been updated on or after 2024-05-19 and where the content type is Tutorial** section in the notebook to see how the filter impacts the vector search results.

The answer for this code block is as follows:

**CODE_BLOCK_12**

<details>
<summary>Answer</summary>
Expand All @@ -100,16 +105,14 @@ The answers for code blocks in this section are as follows:
[
{
"$vectorSearch": {
"index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
"queryVector": query_embedding,
"index": "vector_index",
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": 150,
"limit": 5,
"filter": {
"$and": [
{"metadata.contentType": "Video"},
{"updated": {"$gte": "2024-05-20"}}
]
"metadata.contentType": "Tutorial",
"updated": {"$gte": "2024-05-19"}
}
}
},
Expand Down
19 changes: 6 additions & 13 deletions docs/70-build-rag-app/1-build-rag-app.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fill in any `<CODE_BLOCK_N>` placeholders and run the cells under the **Step 9:

The answers for code blocks in this section are as follows:

**CODE_BLOCK_21**
**CODE_BLOCK_13**

<details>
<summary>Answer</summary>
Expand All @@ -17,34 +17,27 @@ vector_search(user_query)
</div>
</details>

**CODE_BLOCK_22**
**CODE_BLOCK_14**

<details>
<summary>Answer</summary>
<div>
```python
"\n\n".join([d.get("body", "") for d in context])
create_prompt(user_query)
```
</div>
</details>

**CODE_BLOCK_23**
**CODE_BLOCK_15**

<details>
<summary>Answer</summary>
<div>
```python
response = fw_client.chat.completions.create(
fw_client.chat.completions.create(
model=model,
temperature=0,
messages=[
{
"role": "user",
"content": create_prompt(user_query),
}
],
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)
```
</div>
</details>
Loading

0 comments on commit 605f46c

Please sign in to comment.