From 9b4c9a021e80757d17f9c4676dac5cafc141a8a3 Mon Sep 17 00:00:00 2001 From: jacoblee93 Date: Fri, 2 Aug 2024 11:23:05 -0700 Subject: [PATCH 1/6] Adds text embeddings template --- .../cli/docs/templates/text_embedding.ipynb | 214 ++++++++++++++++++ 1 file changed, 214 insertions(+) create mode 100644 libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb diff --git a/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb new file mode 100644 index 000000000000..355772a10b73 --- /dev/null +++ b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb @@ -0,0 +1,214 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "afaf8039", + "metadata": {}, + "source": [ + "---\n", + "sidebar_label: __ModuleName__\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "9a3d6f34", + "metadata": {}, + "source": [ + "# __ModuleName__Embeddings\n", + "\n", + "- [ ] TODO: Make sure API reference link is correct\n", + "\n", + "This will help you get started with __ModuleName__ [embedding models](/docs/concepts#embedding-models) using LangChain. For detailed documentation on `__ModuleName__Embeddings` features and configuration options, please refer to the [API reference](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Embeddings.html).\n", + "\n", + "## Overview\n", + "### Integration details\n", + "\n", + "- TODO: Fill in table features.\n", + "- TODO: Remove JS support link if not relevant, otherwise ensure link is correct.\n", + "- TODO: Make sure API reference links are correct.\n", + "\n", + "| Class | Package | Local | [Py support](https://js.langchain.com/v0.2/docs/integrations/text_embedding/__package_name_short_snake__) | Package downloads | Package latest |\n", + "| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n", + "| [__ModuleName__Embeddings](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Embeddings.html) | [__package_name_pretty__](https://api.js.langchain.com/modules/__package_name_snake_case__.html) | ✅/❌ | ✅/❌ | ![NPM - Downloads](https://img.shields.io/npm/dm/__package_name_pretty__?style=flat-square&label=%20&) | ![NPM - Version](https://img.shields.io/npm/v/__package_name_pretty__?style=flat-square&label=%20&) |\n", + "\n", + "## Setup\n", + "\n", + "- [ ] TODO: Update with relevant info.\n", + "\n", + "To access __ModuleName__ embedding models you'll need to create a/an __ModuleName__ account, get an API key, and install the `__package_name__` integration package.\n", + "\n", + "### Credentials\n", + "\n", + "- TODO: Update with relevant info.\n", + "\n", + "Head to (TODO: link) to sign up to `__ModuleName__` and generate an API key. Once you've done this set the `__MODULE_NAME_ALL_CAPS___API_KEY` environment variable:\n", + "\n", + "```bash\n", + "export __MODULE_NAME_ALL_CAPS___API_KEY=\"your-api-key\"\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```bash\n", + "# export LANGCHAIN_TRACING_V2=\"true\"\n", + "# export LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```\n", + "\n", + "### Installation\n", + "\n", + "The LangChain __ModuleName__ integration lives in the `__package_name_pretty__` package:\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " __package_name_pretty__\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "45dd1724", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Now we can instantiate our model object and generate chat completions:\n", + "\n", + "- TODO: Update model instantiation with relevant params." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9ea7a09b", + "metadata": {}, + "outputs": [], + "source": [ + "import { __ModuleName__Embeddings } from \"__module_name__\";\n", + "\n", + "const embeddings = new __ModuleName__Embeddings({\n", + " model: \"model-name\",\n", + " // ...\n", + "});" + ] + }, + { + "cell_type": "markdown", + "id": "77d271b6", + "metadata": {}, + "source": [ + "## Indexing and Retrieval\n", + "\n", + "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n", + "\n", + "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d817716b", + "metadata": {}, + "outputs": [], + "source": [ + "// Create a vector store with a sample text\n", + "import { MemoryVectorStore } from \"langchain/vectorstores/memory\";\n", + "\n", + "const text = \"LangChain is the framework for building context-aware reasoning applications\";\n", + "\n", + "const vectorstore = await MemoryVectorStore.fromDocuments(\n", + " [{ pageContent: text, metadata: {} }],\n", + " embeddings,\n", + ");\n", + "\n", + "// Use the vector store as a retriever that returns a single document\n", + "const retriever = vectorstore.asRetriever(1);\n", + "\n", + "// Retrieve the most similar text\n", + "const retrievedDocument = await retriever.invoke(\"What is LangChain?\");\n", + "\n", + "retrievedDocument.pageContent;" + ] + }, + { + "cell_type": "markdown", + "id": "e02b9855", + "metadata": {}, + "source": [ + "## Direct Usage\n", + "\n", + "Under the hood, the vectorstore and retriever implementations are calling `embeddings.embedDocument(...)` and `embeddings.embedQuery(...)` to create embeddings for the text(s) used in `fromDocuments` and the retriever's `invoke` operations, respectively.\n", + "\n", + "You can directly call these methods to get embeddings for your own use cases.\n", + "\n", + "### Embed single texts\n", + "\n", + "You can embed queries for search with `embedQuery`. This generates a vector representation specific to the query:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0d2befcd", + "metadata": {}, + "outputs": [], + "source": [ + "const singleVector = await embeddings.embedQuery(text);\n", + "\n", + "console.log(singleVector.slice(0, 100));" + ] + }, + { + "cell_type": "markdown", + "id": "1b5a7d03", + "metadata": {}, + "source": [ + "### Embed multiple texts\n", + "\n", + "You can embed multiple texts for indexing with `embedDocuments`. The internals used for this method may (but do not have to) differ from embedding queries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f4d6e97", + "metadata": {}, + "outputs": [], + "source": [ + "const text2 = \"LangGraph is a library for building stateful, multi-actor applications with LLMs\";\n", + "\n", + "const vectors = await embeddings.embedDocuments([text, text2]);\n", + "\n", + "console.log(vectors[0].slice(0, 100));\n", + "console.log(vectors[1].slice(0, 100));" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "typescript", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From b76fe081b54408caf5cfde7f10819700a1a20aac Mon Sep 17 00:00:00 2001 From: jacoblee93 Date: Fri, 2 Aug 2024 11:24:46 -0700 Subject: [PATCH 2/6] Copy --- .../src/cli/docs/templates/text_embedding.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb index 355772a10b73..e4cea848bd4a 100644 --- a/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb +++ b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb @@ -107,7 +107,7 @@ "\n", "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n", "\n", - "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`." + "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document using the demo [`MemoryVectorStore`](/docs/integrations/vectorstores/memory)." ] }, { From c95031b5ef37b813a09cd4f816d701b97bd7c02f Mon Sep 17 00:00:00 2001 From: bracesproul Date: Fri, 2 Aug 2024 13:20:47 -0700 Subject: [PATCH 3/6] update cli templatea and add openai --- .../integrations/text_embedding/openai.ipynb | 304 ++++++++++++++++++ .../src/cli/docs/embeddings.ts | 186 +++++++++++ libs/langchain-scripts/src/cli/docs/index.ts | 10 +- .../cli/docs/templates/text_embedding.ipynb | 42 ++- 4 files changed, 527 insertions(+), 15 deletions(-) create mode 100644 docs/core_docs/docs/integrations/text_embedding/openai.ipynb create mode 100644 libs/langchain-scripts/src/cli/docs/embeddings.ts diff --git a/docs/core_docs/docs/integrations/text_embedding/openai.ipynb b/docs/core_docs/docs/integrations/text_embedding/openai.ipynb new file mode 100644 index 000000000000..3450afb7bbc1 --- /dev/null +++ b/docs/core_docs/docs/integrations/text_embedding/openai.ipynb @@ -0,0 +1,304 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "afaf8039", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: OpenAI\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "9a3d6f34", + "metadata": {}, + "source": [ + "# OpenAI\n", + "\n", + "This will help you get started with OpenAIEmbeddings [embedding models](/docs/concepts#embedding-models) using LangChain. For detailed documentation on `OpenAIEmbeddings` features and configuration options, please refer to the [API reference](https://api.js.langchain.com/classes/langchain_openai.OpenAIEmbeddings.html).\n", + "\n", + "## Overview\n", + "### Integration details\n", + "\n", + "| Class | Package | Local | [Py support](https://python.langchain.com/docs/integrations/text_embedding/openai/) | Package downloads | Package latest |\n", + "| :--- | :--- | :---: | :---: | :---: | :---: |\n", + "| [OpenAIEmbeddings](https://api.js.langchain.com/classes/langchain_openai.OpenAIEmbeddings.html) | [@langchain/openai](https://api.js.langchain.com/modules/langchain_openai.html) | ❌ | ✅ | ![NPM - Downloads](https://img.shields.io/npm/dm/@langchain/openai?style=flat-square&label=%20&) | ![NPM - Version](https://img.shields.io/npm/v/@langchain/openai?style=flat-square&label=%20&) |\n", + "\n", + "## Setup\n", + "\n", + "To access OpenAIEmbeddings embedding models you'll need to create an OpenAI account, get an API key, and install the `@langchain/openai` integration package.\n", + "\n", + "### Credentials\n", + "\n", + "Head to [platform.openai.com](https://platform.openai.com) to sign up to OpenAI and generate an API key. Once you've done this set the `OPENAI_API_KEY` environment variable:\n", + "\n", + "```bash\n", + "export OPENAI_API_KEY=\"your-api-key\"\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```bash\n", + "# export LANGCHAIN_TRACING_V2=\"true\"\n", + "# export LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```\n", + "\n", + "### Installation\n", + "\n", + "The LangChain OpenAIEmbeddings integration lives in the `@langchain/openai` package:\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/openai\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "45dd1724", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Now we can instantiate our model object and generate chat completions:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "9ea7a09b", + "metadata": {}, + "outputs": [], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "\n", + "const embeddings = new OpenAIEmbeddings({\n", + " apiKey: \"YOUR-API-KEY\", // In Node.js defaults to process.env.OPENAI_API_KEY\n", + " batchSize: 512, // Default value if omitted is 512. Max is 2048\n", + " model: \"text-embedding-3-large\",\n", + "});" + ] + }, + { + "cell_type": "markdown", + "id": "77d271b6", + "metadata": {}, + "source": [ + "## Indexing and Retrieval\n", + "\n", + "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n", + "\n", + "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document using the demo [`MemoryVectorStore`](/docs/integrations/vectorstores/memory)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d817716b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "LangChain is the framework for building context-aware reasoning applications\n" + ] + } + ], + "source": [ + "// Create a vector store with a sample text\n", + "import { MemoryVectorStore } from \"langchain/vectorstores/memory\";\n", + "\n", + "const text = \"LangChain is the framework for building context-aware reasoning applications\";\n", + "\n", + "const vectorstore = await MemoryVectorStore.fromDocuments(\n", + " [{ pageContent: text, metadata: {} }],\n", + " embeddings,\n", + ");\n", + "\n", + "// Use the vector store as a retriever that returns a single document\n", + "const retriever = vectorstore.asRetriever(1);\n", + "\n", + "// Retrieve the most similar text\n", + "const retrievedDocuments = await retriever.invoke(\"What is LangChain?\");\n", + "\n", + "retrievedDocuments[0].pageContent;" + ] + }, + { + "cell_type": "markdown", + "id": "e02b9855", + "metadata": {}, + "source": [ + "## Direct Usage\n", + "\n", + "Under the hood, the vectorstore and retriever implementations are calling `embeddings.embedDocument(...)` and `embeddings.embedQuery(...)` to create embeddings for the text(s) used in `fromDocuments` and the retriever's `invoke` operations, respectively.\n", + "\n", + "You can directly call these methods to get embeddings for your own use cases.\n", + "\n", + "### Embed single texts\n", + "\n", + "You can embed queries for search with `embedQuery`. This generates a vector representation specific to the query:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "0d2befcd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " -0.01927683, 0.0037708976, -0.032942563, 0.0037671267, 0.008175306,\n", + " -0.012511838, -0.009713832, 0.021403614, -0.015377721, 0.0018684798,\n", + " 0.020574018, 0.022399133, -0.02322873, -0.01524951, -0.00504169,\n", + " -0.007375876, -0.03448109, 0.00015130726, 0.021388533, -0.012564631,\n", + " -0.020031009, 0.027406884, -0.039217334, 0.03036327, 0.030393435,\n", + " -0.021750538, 0.032610722, -0.021162277, -0.025898525, 0.018869571,\n", + " 0.034179416, -0.013371604, 0.0037652412, -0.02146395, 0.0012641934,\n", + " -0.055688616, 0.05104287, 0.0024982197, -0.019095825, 0.0037369595,\n", + " 0.00088757504, 0.025189597, -0.018779071, 0.024978427, 0.016833287,\n", + " -0.0025868358, -0.011727491, -0.0021154736, -0.017738303, 0.0013839195,\n", + " -0.0131151825, -0.05405959, 0.029729757, -0.003393808, 0.019774588,\n", + " 0.028885076, 0.004355387, 0.026094612, 0.06479911, 0.038040817,\n", + " -0.03478276, -0.012594799, -0.024767255, -0.0031430433, 0.017874055,\n", + " -0.015294761, 0.005709139, 0.025355516, 0.044798266, 0.02549127,\n", + " -0.02524993, 0.00014553308, -0.019427665, -0.023545485, 0.008748483,\n", + " 0.019850006, -0.028417485, -0.001860938, -0.02318348, -0.010799851,\n", + " 0.04793565, -0.0048983963, 0.02193154, -0.026411368, 0.026426451,\n", + " -0.012149832, 0.035355937, -0.047814984, -0.027165547, -0.008228099,\n", + " -0.007737882, 0.023726488, -0.046487626, -0.007783133, -0.019638835,\n", + " 0.01793439, -0.018024892, 0.0030336871, -0.019578502, 0.0042837397\n", + "]\n" + ] + } + ], + "source": [ + "const singleVector = await embeddings.embedQuery(text);\n", + "\n", + "console.log(singleVector.slice(0, 100));" + ] + }, + { + "cell_type": "markdown", + "id": "1b5a7d03", + "metadata": {}, + "source": [ + "### Embed multiple texts\n", + "\n", + "You can embed multiple texts for indexing with `embedDocuments`. The internals used for this method may (but do not have to) differ from embedding queries:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2f4d6e97", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " -0.01927683, 0.0037708976, -0.032942563, 0.0037671267, 0.008175306,\n", + " -0.012511838, -0.009713832, 0.021403614, -0.015377721, 0.0018684798,\n", + " 0.020574018, 0.022399133, -0.02322873, -0.01524951, -0.00504169,\n", + " -0.007375876, -0.03448109, 0.00015130726, 0.021388533, -0.012564631,\n", + " -0.020031009, 0.027406884, -0.039217334, 0.03036327, 0.030393435,\n", + " -0.021750538, 0.032610722, -0.021162277, -0.025898525, 0.018869571,\n", + " 0.034179416, -0.013371604, 0.0037652412, -0.02146395, 0.0012641934,\n", + " -0.055688616, 0.05104287, 0.0024982197, -0.019095825, 0.0037369595,\n", + " 0.00088757504, 0.025189597, -0.018779071, 0.024978427, 0.016833287,\n", + " -0.0025868358, -0.011727491, -0.0021154736, -0.017738303, 0.0013839195,\n", + " -0.0131151825, -0.05405959, 0.029729757, -0.003393808, 0.019774588,\n", + " 0.028885076, 0.004355387, 0.026094612, 0.06479911, 0.038040817,\n", + " -0.03478276, -0.012594799, -0.024767255, -0.0031430433, 0.017874055,\n", + " -0.015294761, 0.005709139, 0.025355516, 0.044798266, 0.02549127,\n", + " -0.02524993, 0.00014553308, -0.019427665, -0.023545485, 0.008748483,\n", + " 0.019850006, -0.028417485, -0.001860938, -0.02318348, -0.010799851,\n", + " 0.04793565, -0.0048983963, 0.02193154, -0.026411368, 0.026426451,\n", + " -0.012149832, 0.035355937, -0.047814984, -0.027165547, -0.008228099,\n", + " -0.007737882, 0.023726488, -0.046487626, -0.007783133, -0.019638835,\n", + " 0.01793439, -0.018024892, 0.0030336871, -0.019578502, 0.0042837397\n", + "]\n", + "[\n", + " -0.010181213, 0.023419594, -0.04215527, -0.0015320902, -0.023573855,\n", + " -0.0091644935, -0.014893179, 0.019016149, -0.023475688, 0.0010219777,\n", + " 0.009255648, 0.03996757, -0.04366983, -0.01640774, -0.020194141,\n", + " 0.019408813, -0.027977299, -0.022017224, 0.013539891, -0.007769135,\n", + " 0.032647192, -0.015089511, -0.022900717, 0.023798235, 0.026084099,\n", + " -0.024625633, 0.035003178, -0.017978394, -0.049615882, 0.013364594,\n", + " 0.031132633, 0.019142363, 0.023195215, -0.038396914, 0.005584942,\n", + " -0.031946007, 0.053682756, -0.0036356465, 0.011240003, 0.0056690844,\n", + " -0.0062791156, 0.044146635, -0.037387207, 0.01300699, 0.018946031,\n", + " 0.0050415234, 0.029618073, -0.021750772, -0.000649473, 0.00026951815,\n", + " -0.014710871, -0.029814405, 0.04204308, -0.014710871, 0.0039616977,\n", + " -0.021512369, 0.054608323, 0.021484323, 0.02790718, -0.010573876,\n", + " -0.023952495, -0.035143413, -0.048802506, -0.0075798146, 0.023279356,\n", + " -0.022690361, -0.016590048, 0.0060477243, 0.014100839, 0.005476258,\n", + " -0.017221114, -0.0100059165, -0.017922299, -0.021989176, 0.01830094,\n", + " 0.05516927, 0.001033372, 0.0017310516, -0.00960624, -0.037864015,\n", + " 0.013063084, 0.006591143, -0.010160177, 0.0011394264, 0.04953174,\n", + " 0.004806626, 0.029421741, -0.037751824, 0.003618117, 0.007162609,\n", + " 0.027696826, -0.0021070621, -0.024485396, -0.0042141243, -0.02801937,\n", + " -0.019605145, 0.016281527, -0.035143413, 0.01640774, 0.042323552\n", + "]\n" + ] + } + ], + "source": [ + "const text2 = \"LangGraph is a library for building stateful, multi-actor applications with LLMs\";\n", + "\n", + "const vectors = await embeddings.embedDocuments([text, text2]);\n", + "\n", + "console.log(vectors[0].slice(0, 100));\n", + "console.log(vectors[1].slice(0, 100));" + ] + }, + { + "cell_type": "markdown", + "id": "8938e581", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all OpenAIEmbeddings features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_openai.OpenAIEmbeddings.html" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/langchain-scripts/src/cli/docs/embeddings.ts b/libs/langchain-scripts/src/cli/docs/embeddings.ts new file mode 100644 index 000000000000..a43632d6c854 --- /dev/null +++ b/libs/langchain-scripts/src/cli/docs/embeddings.ts @@ -0,0 +1,186 @@ +import * as path from "node:path"; +import * as fs from "node:fs"; +import { + boldText, + getUserInput, + greenText, + redBackground, +} from "../utils/get-input.js"; + +const PACKAGE_NAME_PLACEHOLDER = "__package_name__"; +const MODULE_NAME_PLACEHOLDER = "__ModuleName__"; +const SIDEBAR_LABEL_PLACEHOLDER = "__sidebar_label__"; +const FULL_IMPORT_PATH_PLACEHOLDER = "__full_import_path__"; +const LOCAL_PLACEHOLDER = "__local__"; +const PY_SUPPORT_PLACEHOLDER = "__py_support__"; +const ENV_VAR_NAME_PLACEHOLDER = "__env_var_name__"; +const API_REF_MODULE_PLACEHOLDER = "__api_ref_module__"; +const API_REF_PACKAGE_PLACEHOLDER = "__api_ref_package__"; +const PYTHON_DOC_URL_PLACEHOLDER = "__python_doc_url__"; + +const TEMPLATE_PATH = path.resolve( + "./src/cli/docs/templates/text_embedding.ipynb" +); +const INTEGRATIONS_DOCS_PATH = path.resolve( + "../../docs/core_docs/docs/integrations/text_embedding" +); + +const fetchAPIRefUrl = async (url: string): Promise => { + try { + const res = await fetch(url); + if (res.status !== 200) { + throw new Error(`API Reference URL ${url} not found.`); + } + return true; + } catch (_) { + return false; + } +}; + +type ExtraFields = { + local: boolean; + pySupport: boolean; + packageName: string; + fullImportPath?: string; + envVarName: string; +}; + +async function promptExtraFields(fields: { + envVarGuess: string; + packageNameGuess: string; + isCommunity: boolean; +}): Promise { + const { envVarGuess, packageNameGuess, isCommunity } = fields; + const canRunLocally = await getUserInput( + "Does this embeddings model support local usage? (y/n) ", + undefined, + true + ); + const hasPySupport = await getUserInput( + "Does this integration have Python support? (y/n) ", + undefined, + true + ); + + let packageName = packageNameGuess; + if (!isCommunity) { + // If it's not community, get the package name. + + const isOtherPackageName = await getUserInput( + `Is this integration part of the ${packageNameGuess} package? (y/n) ` + ); + if (isOtherPackageName.toLowerCase() === "n") { + packageName = await getUserInput( + "What is the name of the package this integration is located in? (e.g @langchain/openai) ", + undefined, + true + ); + if ( + !packageName.startsWith("@langchain/") && + !packageName.startsWith("langchain/") + ) { + packageName = await getUserInput( + "Packages must start with either '@langchain/' or 'langchain/'. Please enter a valid package name: ", + undefined, + true + ); + } + } + } + + // If it's community or langchain, ask for the full import path + let fullImportPath: string | undefined; + if ( + packageName.startsWith("@langchain/community") || + packageName.startsWith("langchain/") + ) { + fullImportPath = await getUserInput( + "What is the full import path of the package? (e.g '@langchain/community/embeddings/togetherai') ", + undefined, + true + ); + } + + const envVarName = await getUserInput( + `Is the environment variable for the API key named ${envVarGuess}? If it is, reply with 'y', else reply with the correct name: `, + undefined, + true + ); + + return { + local: canRunLocally.toLowerCase() === "y", + pySupport: hasPySupport.toLowerCase() === "y", + packageName, + fullImportPath, + envVarName: + envVarName.toLowerCase() === "y" ? envVarGuess : envVarName.toUpperCase(), + }; +} + +export async function fillEmbeddingsIntegrationDocTemplate(fields: { + packageName: string; + moduleName: string; + isCommunity: boolean; +}) { + const sidebarLabel = fields.moduleName.replace("Embeddings", ""); + const pyDocUrl = `https://python.langchain.com/docs/integrations/text_embedding/${sidebarLabel.toLowerCase()}/`; + let envVarName = `${sidebarLabel.toUpperCase()}_API_KEY`; + const extraFields = await promptExtraFields({ + packageNameGuess: `@langchain/${fields.packageName}`, + envVarGuess: envVarName, + isCommunity: fields.isCommunity, + }); + envVarName = extraFields.envVarName; + const pySupport = extraFields.pySupport; + const localSupport = extraFields.local; + const packageName = extraFields.packageName; + const fullImportPath = extraFields.fullImportPath ?? extraFields.packageName; + + const apiRefModuleUrl = `https://api.js.langchain.com/classes/${fullImportPath + .replace("@", "") + .replaceAll("/", "_") + .replaceAll("-", "_")}.${fields.moduleName}.html`; + const apiRefPackageUrl = apiRefModuleUrl + .replace("/classes/", "/modules/") + .replace(`.${fields.moduleName}.html`, ".html"); + + const apiRefUrlSuccesses = await Promise.all([ + fetchAPIRefUrl(apiRefModuleUrl), + fetchAPIRefUrl(apiRefPackageUrl), + ]); + if (apiRefUrlSuccesses.find((s) => !s)) { + console.warn( + "API ref URLs invalid. Please manually ensure they are correct." + ); + } + + const docTemplate = (await fs.promises.readFile(TEMPLATE_PATH, "utf-8")) + .replaceAll(PACKAGE_NAME_PLACEHOLDER, packageName) + .replaceAll(MODULE_NAME_PLACEHOLDER, fields.moduleName) + .replaceAll(SIDEBAR_LABEL_PLACEHOLDER, sidebarLabel) + .replaceAll(FULL_IMPORT_PATH_PLACEHOLDER, fullImportPath) + .replaceAll(LOCAL_PLACEHOLDER, localSupport ? "✅" : "❌") + .replaceAll(PY_SUPPORT_PLACEHOLDER, pySupport ? "✅" : "❌") + .replaceAll(ENV_VAR_NAME_PLACEHOLDER, envVarName) + .replaceAll(API_REF_MODULE_PLACEHOLDER, apiRefModuleUrl) + .replaceAll(API_REF_PACKAGE_PLACEHOLDER, apiRefPackageUrl) + .replaceAll(PYTHON_DOC_URL_PLACEHOLDER, pyDocUrl); + + const docFileName = fullImportPath.split("/").pop(); + const docPath = path.join(INTEGRATIONS_DOCS_PATH, `${docFileName}.ipynb`); + await fs.promises.writeFile(docPath, docTemplate); + const prettyDocPath = docPath.split("docs/core_docs/")[1]; + + const updatePythonDocUrlText = ` ${redBackground( + "- Update the Python documentation URL with the proper URL." + )}`; + const successText = `\nSuccessfully created new chat model integration doc at ${prettyDocPath}.`; + + console.log( + `${greenText(successText)}\n +${boldText("Next steps:")} +${extraFields?.pySupport ? updatePythonDocUrlText : ""} + - Run all code cells in the generated doc to record the outputs. + - Add extra sections on integration specific features.\n` + ); +} diff --git a/libs/langchain-scripts/src/cli/docs/index.ts b/libs/langchain-scripts/src/cli/docs/index.ts index d664a220a240..1baba8b900bf 100644 --- a/libs/langchain-scripts/src/cli/docs/index.ts +++ b/libs/langchain-scripts/src/cli/docs/index.ts @@ -5,6 +5,7 @@ import { Command } from "commander"; import { fillChatIntegrationDocTemplate } from "./chat.js"; import { fillDocLoaderIntegrationDocTemplate } from "./document_loaders.js"; import { fillLLMIntegrationDocTemplate } from "./llms.js"; +import { fillEmbeddingsIntegrationDocTemplate } from "./embeddings.js"; type CLIInput = { package: string; @@ -57,9 +58,16 @@ async function main() { isCommunity, }); break; + case "embeddings": + await fillEmbeddingsIntegrationDocTemplate({ + packageName, + moduleName, + isCommunity, + }); + break; default: console.error( - `Invalid type: ${type}.\nOnly 'chat', 'llm' and 'doc_loader' are supported at this time.` + `Invalid type: ${type}.\nOnly 'chat', 'llm', 'embeddings' and 'doc_loader' are supported at this time.` ); process.exit(1); } diff --git a/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb index e4cea848bd4a..48d85f8a3afc 100644 --- a/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb +++ b/libs/langchain-scripts/src/cli/docs/templates/text_embedding.ipynb @@ -3,10 +3,14 @@ { "cell_type": "raw", "id": "afaf8039", - "metadata": {}, + "metadata": { + "vscode": { + "languageId": "raw" + } + }, "source": [ "---\n", - "sidebar_label: __ModuleName__\n", + "sidebar_label: __sidebar_label__\n", "---" ] }, @@ -15,11 +19,11 @@ "id": "9a3d6f34", "metadata": {}, "source": [ - "# __ModuleName__Embeddings\n", + "# __ModuleName__\n", "\n", "- [ ] TODO: Make sure API reference link is correct\n", "\n", - "This will help you get started with __ModuleName__ [embedding models](/docs/concepts#embedding-models) using LangChain. For detailed documentation on `__ModuleName__Embeddings` features and configuration options, please refer to the [API reference](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Embeddings.html).\n", + "This will help you get started with __ModuleName__ [embedding models](/docs/concepts#embedding-models) using LangChain. For detailed documentation on `__ModuleName__` features and configuration options, please refer to the [API reference](__api_ref_module__).\n", "\n", "## Overview\n", "### Integration details\n", @@ -28,24 +32,24 @@ "- TODO: Remove JS support link if not relevant, otherwise ensure link is correct.\n", "- TODO: Make sure API reference links are correct.\n", "\n", - "| Class | Package | Local | [Py support](https://js.langchain.com/v0.2/docs/integrations/text_embedding/__package_name_short_snake__) | Package downloads | Package latest |\n", - "| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n", - "| [__ModuleName__Embeddings](https://api.js.langchain.com/classes/__package_name__.__ModuleName__Embeddings.html) | [__package_name_pretty__](https://api.js.langchain.com/modules/__package_name_snake_case__.html) | ✅/❌ | ✅/❌ | ![NPM - Downloads](https://img.shields.io/npm/dm/__package_name_pretty__?style=flat-square&label=%20&) | ![NPM - Version](https://img.shields.io/npm/v/__package_name_pretty__?style=flat-square&label=%20&) |\n", + "| Class | Package | Local | [Py support](__python_doc_url__) | Package downloads | Package latest |\n", + "| :--- | :--- | :---: | :---: | :---: | :---: |\n", + "| [__ModuleName__](__api_ref_module__) | [__package_name__](__api_ref_package__) | __local__ | __py_support__ | ![NPM - Downloads](https://img.shields.io/npm/dm/__package_name__?style=flat-square&label=%20&) | ![NPM - Version](https://img.shields.io/npm/v/__package_name__?style=flat-square&label=%20&) |\n", "\n", "## Setup\n", "\n", "- [ ] TODO: Update with relevant info.\n", "\n", - "To access __ModuleName__ embedding models you'll need to create a/an __ModuleName__ account, get an API key, and install the `__package_name__` integration package.\n", + "To access __sidebar_label__ embedding models you'll need to create a/an __ModuleName__ account, get an API key, and install the `__package_name__` integration package.\n", "\n", "### Credentials\n", "\n", "- TODO: Update with relevant info.\n", "\n", - "Head to (TODO: link) to sign up to `__ModuleName__` and generate an API key. Once you've done this set the `__MODULE_NAME_ALL_CAPS___API_KEY` environment variable:\n", + "Head to (TODO: link) to sign up to `__sidebar_label__` and generate an API key. Once you've done this set the `__env_var_name__` environment variable:\n", "\n", "```bash\n", - "export __MODULE_NAME_ALL_CAPS___API_KEY=\"your-api-key\"\n", + "export __env_var_name__=\"your-api-key\"\n", "```\n", "\n", "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", @@ -57,7 +61,7 @@ "\n", "### Installation\n", "\n", - "The LangChain __ModuleName__ integration lives in the `__package_name_pretty__` package:\n", + "The LangChain __ModuleName__ integration lives in the `__package_name__` package:\n", "\n", "```{=mdx}\n", "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", @@ -66,7 +70,7 @@ "\n", "\n", "\n", - " __package_name_pretty__\n", + " __package_name__\n", "\n", "```" ] @@ -90,9 +94,9 @@ "metadata": {}, "outputs": [], "source": [ - "import { __ModuleName__Embeddings } from \"__module_name__\";\n", + "import { __ModuleName__ } from \"__full_import_path__\";\n", "\n", - "const embeddings = new __ModuleName__Embeddings({\n", + "const embeddings = new __ModuleName__({\n", " model: \"model-name\",\n", " // ...\n", "});" @@ -188,6 +192,16 @@ "console.log(vectors[0].slice(0, 100));\n", "console.log(vectors[1].slice(0, 100));" ] + }, + { + "cell_type": "markdown", + "id": "8938e581", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all __ModuleName__ features and configurations head to the API reference: __api_ref_module__" + ] } ], "metadata": { From 660960a68ec805d088224993ce0704dea282663a Mon Sep 17 00:00:00 2001 From: bracesproul Date: Fri, 2 Aug 2024 13:23:05 -0700 Subject: [PATCH 4/6] additional cells and drop old doc --- .../integrations/text_embedding/openai.ipynb | 114 ++++++++++++++++++ .../integrations/text_embedding/openai.mdx | 77 ------------ 2 files changed, 114 insertions(+), 77 deletions(-) delete mode 100644 docs/core_docs/docs/integrations/text_embedding/openai.mdx diff --git a/docs/core_docs/docs/integrations/text_embedding/openai.ipynb b/docs/core_docs/docs/integrations/text_embedding/openai.ipynb index 3450afb7bbc1..3c820eec3d55 100644 --- a/docs/core_docs/docs/integrations/text_embedding/openai.ipynb +++ b/docs/core_docs/docs/integrations/text_embedding/openai.ipynb @@ -91,6 +91,15 @@ "});" ] }, + { + "cell_type": "markdown", + "id": "fb4153d3", + "metadata": {}, + "source": [ + "If you're part of an organization, you can set `process.env.OPENAI_ORGANIZATION` to your OpenAI organization id, or pass it in as `organization` when\n", + "initializing the model." + ] + }, { "cell_type": "markdown", "id": "77d271b6", @@ -270,6 +279,111 @@ "console.log(vectors[1].slice(0, 100));" ] }, + { + "cell_type": "markdown", + "id": "2b1a3527", + "metadata": {}, + "source": [ + "## Specifying dimensions\n", + "\n", + "With the `text-embedding-3` class of models, you can specify the size of the embeddings you want returned. For example by default `text-embedding-3-large` returns embeddings of dimension 3072:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a611fe1a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3072\n" + ] + } + ], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "\n", + "const embeddingsDefaultDimensions = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-large\",\n", + "});\n", + "\n", + "const vectorsDefaultDimensions = await embeddingsDefaultDimensions.embedDocuments([\"some text\"]);\n", + "console.log(vectorsDefaultDimensions[0].length);" + ] + }, + { + "cell_type": "markdown", + "id": "08efe771", + "metadata": {}, + "source": [ + "But by passing in `dimensions: 1024` we can reduce the size of our embeddings to 1024:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "19667fdb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1024\n" + ] + } + ], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "\n", + "const embeddings1024 = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-large\",\n", + " dimensions: 1024,\n", + "});\n", + "\n", + "const vectors1024 = await embeddings1024.embedDocuments([\"some text\"]);\n", + "console.log(vectors1024[0].length);" + ] + }, + { + "cell_type": "markdown", + "id": "6b84c0df", + "metadata": {}, + "source": [ + "## Custom URLs\n", + "\n", + "You can customize the base URL the SDK sends requests to by passing a `configuration` parameter like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3bfa20a6", + "metadata": {}, + "outputs": [], + "source": [ + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "\n", + "const model = new OpenAIEmbeddings({\n", + " configuration: {\n", + " baseURL: \"https://your_custom_url.com\",\n", + " },\n", + "});" + ] + }, + { + "cell_type": "markdown", + "id": "ac3cac9b", + "metadata": {}, + "source": [ + "You can also pass other `ClientOptions` parameters accepted by the official SDK.\n", + "\n", + "If you are hosting on Azure OpenAI, see the [dedicated page instead](/docs/integrations/text_embedding/azure_openai)." + ] + }, { "cell_type": "markdown", "id": "8938e581", diff --git a/docs/core_docs/docs/integrations/text_embedding/openai.mdx b/docs/core_docs/docs/integrations/text_embedding/openai.mdx deleted file mode 100644 index bba94b0777ee..000000000000 --- a/docs/core_docs/docs/integrations/text_embedding/openai.mdx +++ /dev/null @@ -1,77 +0,0 @@ ---- -keywords: [openaiembeddings] ---- - -# OpenAI - -The `OpenAIEmbeddings` class uses the OpenAI API to generate embeddings for a given text. By default it strips new line characters from the text, as recommended by OpenAI, but you can disable this by passing `stripNewLines: false` to the constructor. - -import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx"; - - - -```bash npm2yarn -npm install @langchain/openai -``` - -```typescript -import { OpenAIEmbeddings } from "@langchain/openai"; - -const embeddings = new OpenAIEmbeddings({ - apiKey: "YOUR-API-KEY", // In Node.js defaults to process.env.OPENAI_API_KEY - batchSize: 512, // Default value if omitted is 512. Max is 2048 - model: "text-embedding-3-large", -}); -``` - -If you're part of an organization, you can set `process.env.OPENAI_ORGANIZATION` to your OpenAI organization id, or pass it in as `organization` when -initializing the model. - -## Specifying dimensions - -With the `text-embedding-3` class of models, you can specify the size of the embeddings you want returned. For example by default `text-embedding-3-large` returns embeddings of dimension 3072: - -```typescript -const embeddings = new OpenAIEmbeddings({ - model: "text-embedding-3-large", -}); - -const vectors = await embeddings.embedDocuments(["some text"]); -console.log(vectors[0].length); -``` - -``` -3072 -``` - -But by passing in `dimensions: 1024` we can reduce the size of our embeddings to 1024: - -```typescript -const embeddings1024 = new OpenAIEmbeddings({ - model: "text-embedding-3-large", - dimensions: 1024, -}); - -const vectors2 = await embeddings1024.embedDocuments(["some text"]); -console.log(vectors2[0].length); -``` - -``` -1024 -``` - -## Custom URLs - -You can customize the base URL the SDK sends requests to by passing a `configuration` parameter like this: - -```typescript -const model = new OpenAIEmbeddings({ - configuration: { - baseURL: "https://your_custom_url.com", - }, -}); -``` - -You can also pass other `ClientOptions` parameters accepted by the official SDK. - -If you are hosting on Azure OpenAI, see the [dedicated page instead](/docs/integrations/text_embedding/azure_openai). From d372ae66c7c28f7b70afe6fc895bccf0f70cb4f8 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Fri, 2 Aug 2024 13:41:52 -0700 Subject: [PATCH 5/6] chore: lint files --- libs/langchain-scripts/src/cli/docs/embeddings.ts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libs/langchain-scripts/src/cli/docs/embeddings.ts b/libs/langchain-scripts/src/cli/docs/embeddings.ts index a43632d6c854..140b41363bd7 100644 --- a/libs/langchain-scripts/src/cli/docs/embeddings.ts +++ b/libs/langchain-scripts/src/cli/docs/embeddings.ts @@ -131,9 +131,9 @@ export async function fillEmbeddingsIntegrationDocTemplate(fields: { isCommunity: fields.isCommunity, }); envVarName = extraFields.envVarName; - const pySupport = extraFields.pySupport; + const {pySupport} = extraFields; const localSupport = extraFields.local; - const packageName = extraFields.packageName; + const {packageName} = extraFields; const fullImportPath = extraFields.fullImportPath ?? extraFields.packageName; const apiRefModuleUrl = `https://api.js.langchain.com/classes/${fullImportPath From d6b95d86414ca9dc579a92c46e2b9dde9463f823 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Fri, 2 Aug 2024 13:44:31 -0700 Subject: [PATCH 6/6] cr --- libs/langchain-scripts/src/cli/docs/embeddings.ts | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libs/langchain-scripts/src/cli/docs/embeddings.ts b/libs/langchain-scripts/src/cli/docs/embeddings.ts index 140b41363bd7..03e1cc96f99e 100644 --- a/libs/langchain-scripts/src/cli/docs/embeddings.ts +++ b/libs/langchain-scripts/src/cli/docs/embeddings.ts @@ -131,9 +131,9 @@ export async function fillEmbeddingsIntegrationDocTemplate(fields: { isCommunity: fields.isCommunity, }); envVarName = extraFields.envVarName; - const {pySupport} = extraFields; + const { pySupport } = extraFields; const localSupport = extraFields.local; - const {packageName} = extraFields; + const { packageName } = extraFields; const fullImportPath = extraFields.fullImportPath ?? extraFields.packageName; const apiRefModuleUrl = `https://api.js.langchain.com/classes/${fullImportPath