From 7a2bab8f101f5076d788be2a3cae9d134f375f45 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Fri, 2 Aug 2024 12:20:40 -0700 Subject: [PATCH] update template and add aws knowledge abse retriever doc --- .../retrievers/bedrock-knowledge-bases.ipynb | 272 ++++++++++++++++++ libs/langchain-scripts/src/cli/docs/index.ts | 9 +- .../src/cli/docs/retrievers.ts | 93 ++++++ .../src/cli/docs/templates/retrievers.ipynb | 20 +- 4 files changed, 383 insertions(+), 11 deletions(-) create mode 100644 docs/core_docs/docs/integrations/retrievers/bedrock-knowledge-bases.ipynb create mode 100644 libs/langchain-scripts/src/cli/docs/retrievers.ts diff --git a/docs/core_docs/docs/integrations/retrievers/bedrock-knowledge-bases.ipynb b/docs/core_docs/docs/integrations/retrievers/bedrock-knowledge-bases.ipynb new file mode 100644 index 000000000000..7cd881d4d388 --- /dev/null +++ b/docs/core_docs/docs/integrations/retrievers/bedrock-knowledge-bases.ipynb @@ -0,0 +1,272 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "afaf8039", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: Knowledge Bases for Amazon Bedrock\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "e49f1e0d", + "metadata": {}, + "source": [ + "# Knowledge Bases for Amazon Bedrock\n", + "\n", + "## Overview\n", + "\n", + "This will help you getting started with the [AmazonKnowledgeBaseRetriever](/docs/concepts/#retrievers). For detailed documentation of all AmazonKnowledgeBaseRetriever features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_aws.AmazonKnowledgeBaseRetriever.html).\n", + "\n", + "Knowledge Bases for Amazon Bedrock is a fully managed support for end-to-end RAG workflow provided by Amazon Web Services (AWS).\n", + "It provides an entire ingestion workflow of converting your documents into embeddings (vector) and storing the embeddings in a specialized vector database.\n", + "Knowledge Bases for Amazon Bedrock supports popular databases for vector storage, including vector engine for Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora (coming soon), and MongoDB (coming soon).\n", + "\n", + "### Integration details\n", + "\n", + "| Retriever | Self-host | Cloud offering | Package | [Py support](https://python.langchain.com/docs/integrations/retrievers/bedrock/) |\n", + "| :--- | :--- | :---: | :---: | :---: |\n", + "[AmazonKnowledgeBaseRetriever](https://api.js.langchain.com/classes/langchain_aws.AmazonKnowledgeBaseRetriever.html) | 🟠 (see details below) | ✅ | @langchain/aws | ✅ |\n", + "\n", + "> AWS Knowledge Base Retriever can be 'self hosted' in the sense you can run it on your own AWS infrastructure. However it is not possible to run on another cloud provider or on-premises.\n", + "\n", + "## Setup\n", + "\n", + "In order to use the AmazonKnowledgeBaseRetriever, you need to have an AWS account, where you can manage your indexes and documents. Once you've setup your account, set the following environment variables:\n", + "\n", + "```bash\n", + "process.env.AWS_KNOWLEDGE_BASE_ID=your-knowledge-base-id\n", + "process.env.AWS_ACCESS_KEY_ID=your-access-key-id\n", + "process.env.AWS_SECRET_ACCESS_KEY=your-secret-access-key\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "72ee0c4b-9764-423a-9dbf-95129e185210", + "metadata": {}, + "source": [ + "If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a15d341e-3e26-4ca3-830b-5aab30ed66de", + "metadata": {}, + "outputs": [], + "source": [ + "// process.env.LANGSMITH_API_KEY = \"\";\n", + "// process.env.LANGSMITH_TRACING = \"true\";" + ] + }, + { + "cell_type": "markdown", + "id": "0730d6a1-c893-4840-9817-5e5251676d5d", + "metadata": {}, + "source": [ + "### Installation\n", + "\n", + "This retriever lives in the `@langchain/aws` package:\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/aws\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "a38cde65-254d-4219-a441-068766c0d4b5", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Now we can instantiate our retriever:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70cc8e65-2a02-408a-bbc6-8ef649057d82", + "metadata": {}, + "outputs": [], + "source": [ + "import { AmazonKnowledgeBaseRetriever } from \"@langchain/aws\";\n", + "\n", + "const retriever = new AmazonKnowledgeBaseRetriever({\n", + " topK: 10,\n", + " knowledgeBaseId: process.env.AWS_KNOWLEDGE_BASE_ID,\n", + " region: \"us-east-2\",\n", + " clientOptions: {\n", + " credentials: {\n", + " accessKeyId: process.env.AWS_ACCESS_KEY_ID,\n", + " secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,\n", + " },\n", + " },\n", + "});" + ] + }, + { + "cell_type": "markdown", + "id": "5c5f2839-4020-424e-9fc9-07777eede442", + "metadata": {}, + "source": [ + "## Usage" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51a60dbe-9f2e-4e04-bb62-23968f17164a", + "metadata": {}, + "outputs": [], + "source": [ + "const query = \"...\"\n", + "\n", + "await retriever.invoke(query);" + ] + }, + { + "cell_type": "markdown", + "id": "dfe8aad4-8626-4330-98a9-7ea1ca5d2e0e", + "metadata": {}, + "source": [ + "## Use within a chain\n", + "\n", + "Like other retrievers, AmazonKnowledgeBaseRetriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n", + "\n", + "We will need a LLM or chat model:\n", + "\n", + "```{=mdx}\n", + "import ChatModelTabs from \"@theme/ChatModelTabs\";\n", + "\n", + "\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "25b647a3-f8f2-4541-a289-7a241e43f9df", + "metadata": {}, + "outputs": [], + "source": [ + "// @ls-docs-hide-cell\n", + "\n", + "import { ChatOpenAI } from \"@langchain/openai\";\n", + "\n", + "const llm = new ChatOpenAI({\n", + " model: \"gpt-4o-mini\",\n", + " temperature: 0,\n", + "});" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23e11cc9-abd6-4855-a7eb-799f45ca01ae", + "metadata": {}, + "outputs": [], + "source": [ + "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n", + "import { RunnablePassthrough, RunnableSequence } from \"@langchain/core/runnables\";\n", + "import { StringOutputParser } from \"@langchain/core/output_parsers\";\n", + "\n", + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const prompt = ChatPromptTemplate.fromTemplate(`\n", + "Answer the question based only on the context provided.\n", + "\n", + "Context: {context}\n", + "\n", + "Question: {question}`);\n", + "\n", + "const formatDocs = (docs: Document[]) => {\n", + " return docs.map((doc) => doc.pageContent).join(\"\\n\\n\");\n", + "}\n", + "\n", + "// See https://js.langchain.com/v0.2/docs/tutorials/rag\n", + "const ragChain = RunnableSequence.from([\n", + " {\n", + " context: retriever.pipe(formatDocumentsAsString),\n", + " question: new RunnablePassthrough(),\n", + " },\n", + " prompt,\n", + " llm,\n", + " new StringOutputParser(),\n", + "]);" + ] + }, + { + "cell_type": "markdown", + "id": "22b1d6f8", + "metadata": {}, + "source": [ + "```{=mdx}\n", + "\n", + ":::tip\n", + "\n", + "See [our RAG tutorial](docs/tutorials/rag) for more information and examples on `RunnableSequence`'s like the one above.\n", + "\n", + ":::\n", + "\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d47c37dd-5c11-416c-a3b6-bec413cd70e8", + "metadata": {}, + "outputs": [], + "source": [ + "await ragChain.invoke(\"...\")" + ] + }, + { + "cell_type": "markdown", + "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all AmazonKnowledgeBaseRetriever features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_aws.AmazonKnowledgeBaseRetriever.html)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "typescript", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/langchain-scripts/src/cli/docs/index.ts b/libs/langchain-scripts/src/cli/docs/index.ts index d664a220a240..989d5ac62842 100644 --- a/libs/langchain-scripts/src/cli/docs/index.ts +++ b/libs/langchain-scripts/src/cli/docs/index.ts @@ -5,6 +5,7 @@ import { Command } from "commander"; import { fillChatIntegrationDocTemplate } from "./chat.js"; import { fillDocLoaderIntegrationDocTemplate } from "./document_loaders.js"; import { fillLLMIntegrationDocTemplate } from "./llms.js"; +import { fillRetrieverIntegrationDocTemplate } from "./retrievers.js"; type CLIInput = { package: string; @@ -57,9 +58,15 @@ async function main() { isCommunity, }); break; + case 'retriever': + await fillRetrieverIntegrationDocTemplate({ + packageName, + moduleName, + }); + break; default: console.error( - `Invalid type: ${type}.\nOnly 'chat', 'llm' and 'doc_loader' are supported at this time.` + `Invalid type: ${type}.\nOnly 'chat', 'llm', 'retrirver' and 'doc_loader' are supported at this time.` ); process.exit(1); } diff --git a/libs/langchain-scripts/src/cli/docs/retrievers.ts b/libs/langchain-scripts/src/cli/docs/retrievers.ts new file mode 100644 index 000000000000..b22552c691c7 --- /dev/null +++ b/libs/langchain-scripts/src/cli/docs/retrievers.ts @@ -0,0 +1,93 @@ +import * as path from "node:path"; +import * as fs from "node:fs"; +import { + boldText, + getUserInput, + greenText, + redBackground, +} from "../utils/get-input.js"; + +const PACKAGE_NAME_PLACEHOLDER = "__package_name__"; +const MODULE_NAME_PLACEHOLDER = "__ModuleName__"; +const PY_SUPPORT_PLACEHOLDER = "__py_support__"; +const HAS_CLOUD_OFFERING_PLACEHOLDER = "__has_cloud_offering__"; +const CAN_SELF_HOST_PLACEHOLDER = "__can_self_host__"; + +const TEMPLATE_PATH = path.resolve("./src/cli/docs/templates/retrievers.ipynb"); +const INTEGRATIONS_DOCS_PATH = path.resolve( + "../../docs/core_docs/docs/integrations/retrievers" +); + + +type ExtraFields = { + hasCloudOffering: boolean; + canSelfHost: boolean; + pySupport: boolean; +}; + +async function promptExtraFields(): Promise { + const hasCloudOffering = await getUserInput( + "Does this retriever support self hosting? (y/n) ", + undefined, + true + ); + const canSelfHost = await getUserInput( + "Does this retriever have a cloud offering? (y/n) ", + undefined, + true + ); + const hasPySupport = await getUserInput( + "Does this integration have Python support? (y/n) ", + undefined, + true + ); + + return { + canSelfHost: canSelfHost.toLowerCase() === "y", + hasCloudOffering: hasCloudOffering.toLowerCase() === "y", + pySupport: hasPySupport.toLowerCase() === "y", + }; +} + +export async function fillRetrieverIntegrationDocTemplate(fields: { + packageName: string; + moduleName: string; +}) { + // Ask the user if they'd like to fill in extra fields, if so, prompt them. + let extraFields: ExtraFields | undefined; + const shouldPromptExtraFields = await getUserInput( + "Would you like to fill out optional fields? (y/n) ", + "white_background" + ); + if (shouldPromptExtraFields.toLowerCase() === "y") { + extraFields = await promptExtraFields(); + } + + const docTemplate = (await fs.promises.readFile(TEMPLATE_PATH, "utf-8")) + .replaceAll(PACKAGE_NAME_PLACEHOLDER, fields.packageName) + .replaceAll(MODULE_NAME_PLACEHOLDER, fields.moduleName) + .replace(HAS_CLOUD_OFFERING_PLACEHOLDER, extraFields?.hasCloudOffering ? "✅" : "❌") + .replace(CAN_SELF_HOST_PLACEHOLDER, extraFields?.canSelfHost ? "✅" : "❌") + .replace(PY_SUPPORT_PLACEHOLDER, extraFields?.pySupport ? "✅" : "❌"); + + const packageNameShortSnakeCase = fields.packageName.replace(/-/g, "_"); + const docPath = path.join( + INTEGRATIONS_DOCS_PATH, + `${packageNameShortSnakeCase}.ipynb` + ); + await fs.promises.writeFile(docPath, docTemplate); + const prettyDocPath = docPath.split("docs/core_docs/")[1]; + + const updatePythonDocUrlText = ` ${redBackground( + "- Update the Python documentation URL with the proper URL." + )}`; + const successText = `\nSuccessfully created new chat model integration doc at ${prettyDocPath}.`; + + console.log( + `${greenText(successText)}\n +${boldText("Next steps:")} +${extraFields?.pySupport ? updatePythonDocUrlText : ""} + - Run all code cells in the generated doc to record the outputs. + - Add extra sections on integration specific features.\n` + ); +} diff --git a/libs/langchain-scripts/src/cli/docs/templates/retrievers.ipynb b/libs/langchain-scripts/src/cli/docs/templates/retrievers.ipynb index b325d266c340..99568ff9afb2 100644 --- a/libs/langchain-scripts/src/cli/docs/templates/retrievers.ipynb +++ b/libs/langchain-scripts/src/cli/docs/templates/retrievers.ipynb @@ -15,13 +15,13 @@ "id": "e49f1e0d", "metadata": {}, "source": [ - "# __ModuleName__Retriever\n", + "# __ModuleName__\n", "\n", "## Overview\n", "\n", "- TODO: Make sure API reference link is correct.\n", "\n", - "This will help you getting started with the __ModuleName__ [retriever](/docs/concepts/#retrievers). For detailed documentation of all __ModuleName__Retriever features and configurations head to the [API reference](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Retriever.html).\n", + "This will help you getting started with the [__ModuleName__](/docs/concepts/#retrievers). For detailed documentation of all __ModuleName__ features and configurations head to the [API reference](ADD_API_REF_URL_HERE).\n", "\n", "### Integration details\n", "\n", @@ -29,15 +29,15 @@ "\n", "1: Bring-your-own data (i.e., index and search a custom corpus of documents):\n", "\n", - "| Retriever | Self-host | Cloud offering | Package | [Py support](https://python.langchain.com/docs/integrations/chat/__package_name_short_snake_case__) |\n", - "| :--- | :--- | :---: | :---: |\n", - "[__ModuleName__Retriever](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Retriever.html) | ❌ | ❌ | __package_name__ |\n", + "| Retriever | Self-host | Cloud offering | Package | [Py support](ADD_PY_DOC_URL_HERE) |\n", + "| :--- | :--- | :---: | :---: | :---: |\n", + "[__ModuleName__](ADD_API_REF_URL_HERE) | __can_self_host__ | __has_cloud_offering__ | __package_name__ | __py_support__ |\n", "\n", "2: External index (e.g., constructed from Internet data or similar):\n", "\n", "| Retriever | Source | Package |\n", "| :--- | :--- | :---: |\n", - "[__ModuleName__Retriever](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Retriever.html) | Source description | __package_name__ |\n", + "[__ModuleName__](ADD_API_REF_URL_HERE) | Source description | __package_name__ |\n", "\n", "## Setup\n", "\n", @@ -103,9 +103,9 @@ "metadata": {}, "outputs": [], "source": [ - "import { __ModuleName__Retriever } from \"__module_name__\";\n", + "import { __ModuleName__ } from \"__package_name__\";\n", "\n", - "const retriever = new __ModuleName__Retriever(\n", + "const retriever = new __ModuleName__(\n", " // ...\n", ");" ] @@ -137,7 +137,7 @@ "source": [ "## Use within a chain\n", "\n", - "Like other retrievers, __ModuleName__Retriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n", + "Like other retrievers, __ModuleName__ can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n", "\n", "We will need a LLM or chat model:\n", "\n", @@ -228,7 +228,7 @@ "source": [ "## API reference\n", "\n", - "For detailed documentation of all __ModuleName__Retriever features and configurations head to the [API reference](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Retriever.html)." + "For detailed documentation of all __ModuleName__ features and configurations head to the [API reference](ADD_API_REF_URL_HERE)." ] } ],