diff --git a/docs/core_docs/docs/integrations/text_embedding/pinecone.ipynb b/docs/core_docs/docs/integrations/text_embedding/pinecone.ipynb new file mode 100644 index 000000000000..e3ab07565f27 --- /dev/null +++ b/docs/core_docs/docs/integrations/text_embedding/pinecone.ipynb @@ -0,0 +1,344 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "afaf8039", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: Pinecone\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "9a3d6f34", + "metadata": {}, + "source": [ + "# PineconeEmbeddings\n", + "\n", + "This will help you get started with PineconeEmbeddings [embedding models](/docs/concepts/embedding_models) using LangChain. For detailed documentation on `PineconeEmbeddings` features and configuration options, please refer to the [API reference](https://api.js.langchain.com/classes/_langchain_pinecone.PineconeEmbeddings.html).\n", + "\n", + "## Overview\n", + "### Integration details\n", + "\n", + "| Class | Package | Local | [Py support](https://python.langchain.com/docs/integrations/text_embedding/pinecone/) | Package downloads | Package latest |\n", + "| :--- | :--- | :---: | :---: | :---: | :---: |\n", + "| [PineconeEmbeddings](https://api.js.langchain.com/classes/_langchain_pinecone.PineconeEmbeddings.html) | [@langchain/pinecone](https://api.js.langchain.com/classes/_langchain_pinecone.PineconeEmbeddings.html) | ❌ | ✅ | ![NPM - Downloads](https://img.shields.io/npm/dm/@langchain/pinecone?style=flat-square&label=%20&) | ![NPM - Version](https://img.shields.io/npm/v/@langchain/pinecone?style=flat-square&label=%20&) |\n", + "\n", + "## Setup\n", + "\n", + "To access Pinecone embedding models you'll need to create a Pinecone account, get an API key, and install the `@langchain/pinecone` integration package.\n", + "\n", + "### Credentials\n", + "\n", + "Sign up for a [Pinecone](https://www.pinecone.io/) account, retrieve your API key, and set it as an environment variable named `PINECONE_API_KEY`:\n", + "\n", + "```typescript\n", + "process.env.PINECONE_API_KEY = \"your-pinecone-api-key\";\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```bash\n", + "# export LANGCHAIN_TRACING_V2=\"true\"\n", + "# export LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```\n", + "\n", + "### Installation\n", + "\n", + "The LangChain PineconeEmbeddings integration lives in the `@langchain/pinecone` package:\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " @langchain/pinecone @langchain/core\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "45dd1724", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Now we can instantiate our model object and generate chat completions:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "9ea7a09b", + "metadata": {}, + "outputs": [], + "source": [ + "import { PineconeEmbeddings } from \"@langchain/pinecone\";\n", + "\n", + "const embeddings = new PineconeEmbeddings({\n", + " model: \"multilingual-e5-large\",\n", + "});" + ] + }, + { + "cell_type": "markdown", + "id": "77d271b6", + "metadata": {}, + "source": [ + "## Indexing and Retrieval\n", + "\n", + "Embedding models are often used in retrieval-augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).\n", + "\n", + "Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document using the demo [`MemoryVectorStore`](/docs/integrations/vectorstores/memory)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d817716b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "LangChain is the framework for building context-aware reasoning applications\n" + ] + } + ], + "source": [ + "// Create a vector store with a sample text\n", + "import { MemoryVectorStore } from \"langchain/vectorstores/memory\";\n", + "\n", + "const text = \"LangChain is the framework for building context-aware reasoning applications\";\n", + "\n", + "const vectorstore = await MemoryVectorStore.fromDocuments(\n", + " [{ pageContent: text, metadata: {} }],\n", + " embeddings,\n", + ");\n", + "\n", + "// Use the vector store as a retriever that returns a single document\n", + "const retriever = vectorstore.asRetriever(1);\n", + "\n", + "// Retrieve the most similar text\n", + "const retrievedDocuments = await retriever.invoke(\"What is LangChain?\");\n", + "\n", + "retrievedDocuments[0].pageContent;" + ] + }, + { + "cell_type": "markdown", + "id": "e02b9855", + "metadata": {}, + "source": [ + "## Direct Usage\n", + "\n", + "Under the hood, the vectorstore and retriever implementations are calling `embeddings.embedDocument(...)` and `embeddings.embedQuery(...)` to create embeddings for the text(s) used in `fromDocuments` and the retriever's `invoke` operations, respectively.\n", + "\n", + "You can directly call these methods to get embeddings for your own use cases.\n", + "\n", + "### Embed single texts\n", + "\n", + "You can embed queries for search with `embedQuery`. This generates a vector representation specific to the query:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "0d2befcd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " 0.0191650390625, 0.004924774169921875, -0.015838623046875,\n", + " -0.04248046875, 0.040191650390625, -0.02679443359375,\n", + " -0.0240936279296875, 0.058624267578125, 0.027069091796875,\n", + " -0.0435791015625, 0.01934814453125, 0.040191650390625,\n", + " -0.0194244384765625, 0.01386260986328125, -0.0216827392578125,\n", + " -0.01073455810546875, -0.0166168212890625, 0.01073455810546875,\n", + " -0.0228271484375, 0.0062255859375, 0.035064697265625,\n", + " -0.0114593505859375, -0.0257110595703125, -0.0285797119140625,\n", + " 0.01190185546875, -0.022186279296875, -0.01500701904296875,\n", + " -0.03240966796875, 0.0019063949584960938, -0.039337158203125,\n", + " -0.0047454833984375, -0.03125, -0.0123291015625,\n", + " -0.00899505615234375, -0.02880859375, 0.014678955078125,\n", + " 0.0452880859375, 0.05035400390625, -0.053436279296875,\n", + " 0.0265960693359375, -0.0206756591796875, 0.06658935546875,\n", + " -0.032989501953125, -0.00724029541015625, 0.0024967193603515625,\n", + " 0.0282135009765625, 0.047088623046875, -0.0255126953125,\n", + " -0.008453369140625, -0.0039215087890625, 0.0282135009765625,\n", + " 0.0270843505859375, -0.0133056640625, -0.0296173095703125,\n", + " -0.0455322265625, 0.0225982666015625, -0.04803466796875,\n", + " -0.00891876220703125, -0.04669189453125, 0.022064208984375,\n", + " -0.0266571044921875, -0.01480865478515625, 0.0295257568359375,\n", + " -0.01561737060546875, -0.0411376953125, 0.01345062255859375,\n", + " 0.0219879150390625, -0.012786865234375, -0.051727294921875,\n", + " -0.0002830028533935547, 0.00690460205078125, -0.01303863525390625,\n", + " -0.0457763671875, -0.026763916015625, -0.0181121826171875,\n", + " 0.00946807861328125, 0.0250244140625, -0.01458740234375,\n", + " 0.0394287109375, -0.0162200927734375, 0.05169677734375,\n", + " 0.01126861572265625, 0.01265716552734375, -0.009307861328125,\n", + " 0.052490234375, 0.0135345458984375, 0.01332855224609375,\n", + " 0.040130615234375, 0.0638427734375, 0.0181121826171875,\n", + " 0.004207611083984375, 0.0771484375, 0.024078369140625,\n", + " 0.012420654296875, -0.030517578125, -0.0019245147705078125,\n", + " 0.0243682861328125, 0.0254974365234375, 0.0036334991455078125,\n", + " -0.004550933837890625\n", + "]\n" + ] + } + ], + "source": [ + "const singleVector = await embeddings.embedQuery(text);\n", + "\n", + "console.log(singleVector.slice(0, 100));" + ] + }, + { + "cell_type": "markdown", + "id": "1b5a7d03", + "metadata": {}, + "source": [ + "### Embed multiple texts\n", + "\n", + "You can embed multiple texts for indexing with `embedDocuments`. The internals used for this method may (but do not have to) differ from embedding queries:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "2f4d6e97", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " 0.0190887451171875, 0.00482940673828125, -0.0158233642578125,\n", + " -0.04254150390625, 0.040130615234375, -0.0268096923828125,\n", + " -0.02392578125, 0.058624267578125, 0.0269927978515625,\n", + " -0.04345703125, 0.0193328857421875, 0.040374755859375,\n", + " -0.0196075439453125, 0.01384735107421875, -0.021881103515625,\n", + " -0.01068878173828125, -0.016510009765625, 0.01079559326171875,\n", + " -0.0227813720703125, 0.00634765625, 0.035064697265625,\n", + " -0.0113983154296875, -0.0257720947265625, -0.0285491943359375,\n", + " 0.011749267578125, -0.0222625732421875, -0.0148468017578125,\n", + " -0.0325927734375, 0.00203704833984375, -0.0394287109375,\n", + " -0.004878997802734375, -0.0311126708984375, -0.01248931884765625,\n", + " -0.00897979736328125, -0.0286407470703125, 0.0146331787109375,\n", + " 0.04522705078125, 0.050201416015625, -0.053314208984375,\n", + " 0.0265960693359375, -0.0207366943359375, 0.06658935546875,\n", + " -0.03302001953125, -0.0073699951171875, 0.0024261474609375,\n", + " 0.028228759765625, 0.04705810546875, -0.0255279541015625,\n", + " -0.0084075927734375, -0.003814697265625, 0.0281524658203125,\n", + " 0.0272064208984375, -0.01322174072265625, -0.0295257568359375,\n", + " -0.045623779296875, 0.022735595703125, -0.0478515625,\n", + " -0.00885772705078125, -0.046844482421875, 0.022003173828125,\n", + " -0.026458740234375, -0.0148468017578125, 0.0295562744140625,\n", + " -0.01555633544921875, -0.041229248046875, 0.01336669921875,\n", + " 0.022125244140625, -0.01276397705078125, -0.051666259765625,\n", + " -0.0002474784851074219, 0.006740570068359375, -0.01306915283203125,\n", + " -0.04583740234375, -0.026611328125, -0.0182342529296875,\n", + " 0.00946044921875, 0.0250701904296875, -0.0146942138671875,\n", + " 0.039459228515625, -0.016265869140625, 0.051788330078125,\n", + " 0.01110076904296875, 0.0126953125, -0.00925445556640625,\n", + " 0.052581787109375, 0.01363372802734375, 0.01332855224609375,\n", + " 0.04010009765625, 0.0638427734375, 0.018157958984375,\n", + " 0.0040740966796875, 0.07720947265625, 0.0240325927734375,\n", + " 0.0123443603515625, -0.0302886962890625, -0.001865386962890625,\n", + " 0.024383544921875, 0.025604248046875, 0.00353240966796875,\n", + " -0.004474639892578125\n", + "]\n", + "[\n", + " 0.0053253173828125, 0.01305389404296875, -0.0253448486328125,\n", + " -0.04241943359375, 0.034942626953125, -0.017425537109375,\n", + " -0.02783203125, 0.064208984375, 0.0244903564453125,\n", + " -0.0467529296875, 0.021209716796875, 0.02191162109375,\n", + " -0.03131103515625, -0.019073486328125, -0.01413726806640625,\n", + " -0.008636474609375, -0.011627197265625, 0.0229339599609375,\n", + " -0.00762939453125, 0.00594329833984375, 0.0201263427734375,\n", + " -0.0247802734375, -0.05047607421875, -0.03765869140625,\n", + " 0.0034332275390625, -0.014617919921875, -0.043548583984375,\n", + " -0.03594970703125, 0.0002884864807128906, -0.03656005859375,\n", + " -0.0102691650390625, 0.0121307373046875, -0.0284271240234375,\n", + " -0.0113525390625, -0.01195526123046875, 0.01143646240234375,\n", + " 0.051727294921875, 0.0230712890625, -0.046417236328125,\n", + " 0.0198211669921875, -0.02337646484375, 0.040985107421875,\n", + " -0.03314208984375, -0.025909423828125, -0.00809478759765625,\n", + " 0.0291595458984375, 0.04296875, -0.016143798828125,\n", + " 0.005706787109375, 0.00860595703125, -0.0035343170166015625,\n", + " 0.0118560791015625, -0.0135650634765625, -0.0294036865234375,\n", + " -0.029876708984375, 0.03515625, -0.0545654296875,\n", + " 0.006862640380859375, -0.041839599609375, 0.021148681640625,\n", + " -0.0279998779296875, -0.00949859619140625, 0.03314208984375,\n", + " -0.002727508544921875, -0.0400390625, 0.01311492919921875,\n", + " 0.01177215576171875, -0.0010013580322265625, -0.052001953125,\n", + " 0.00112152099609375, -0.00815582275390625, 0.0321044921875,\n", + " -0.0496826171875, -0.0151519775390625, -0.0262908935546875,\n", + " -0.005207061767578125, 0.0207977294921875, -0.022705078125,\n", + " 0.009735107421875, 0.000682830810546875, 0.05792236328125,\n", + " -0.0145263671875, 0.03643798828125, 0.0018339157104492188,\n", + " 0.047210693359375, 0.0017986297607421875, 0.0300140380859375,\n", + " 0.027923583984375, 0.044708251953125, 0.027618408203125,\n", + " 0.00104522705078125, 0.05987548828125, 0.06304931640625,\n", + " -0.039703369140625, -0.0386962890625, 0.00797271728515625,\n", + " 0.0254974365234375, 0.0245819091796875, 0.010467529296875,\n", + " -0.0080413818359375\n", + "]\n" + ] + } + ], + "source": [ + "const text2 = \"LangGraph is a library for building stateful, multi-actor applications with LLMs\";\n", + "\n", + "const vectors = await embeddings.embedDocuments([text, text2]);\n", + "\n", + "console.log(vectors[0].slice(0, 100));\n", + "console.log(vectors[1].slice(0, 100));" + ] + }, + { + "cell_type": "markdown", + "id": "8938e581", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all PineconeEmbeddings features and configurations head to the API reference: https://api.js.langchain.com/classes/_langchain_pinecone.PineconeEmbeddings.html" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/src/embeddings/pinecone.ts b/examples/src/embeddings/pinecone.ts new file mode 100644 index 000000000000..e7613ea81065 --- /dev/null +++ b/examples/src/embeddings/pinecone.ts @@ -0,0 +1,12 @@ +import { PineconeEmbeddings } from "@langchain/pinecone"; + +export const run = async () => { + const model = new PineconeEmbeddings(); + console.log({ model }); // Prints out model metadata + const res = await model.embedQuery( + "What would be a good company name a company that makes colorful socks?" + ); + console.log({ res }); +}; + +await run(); diff --git a/examples/src/indexes/vector_stores/pinecone/delete_docs.ts b/examples/src/indexes/vector_stores/pinecone/delete_docs.ts index 33d325e996a2..48ce230ffa93 100644 --- a/examples/src/indexes/vector_stores/pinecone/delete_docs.ts +++ b/examples/src/indexes/vector_stores/pinecone/delete_docs.ts @@ -5,7 +5,7 @@ import { OpenAIEmbeddings } from "@langchain/openai"; import { PineconeStore } from "@langchain/pinecone"; // Instantiate a new Pinecone client, which will automatically read the -// env vars: PINECONE_API_KEY and PINECONE_ENVIRONMENT which come from +// env vars: PINECONE_API_KEY which comes from // the Pinecone dashboard at https://app.pinecone.io const pinecone = new Pinecone(); diff --git a/examples/src/indexes/vector_stores/pinecone/index_docs.ts b/examples/src/indexes/vector_stores/pinecone/index_docs.ts index 84bbccf934c5..31d2a584b2d0 100644 --- a/examples/src/indexes/vector_stores/pinecone/index_docs.ts +++ b/examples/src/indexes/vector_stores/pinecone/index_docs.ts @@ -3,9 +3,10 @@ import { Pinecone } from "@pinecone-database/pinecone"; import { Document } from "@langchain/core/documents"; import { OpenAIEmbeddings } from "@langchain/openai"; import { PineconeStore } from "@langchain/pinecone"; +// import { Index } from "@upstash/vector"; // Instantiate a new Pinecone client, which will automatically read the -// env vars: PINECONE_API_KEY and PINECONE_ENVIRONMENT which come from +// env vars: PINECONE_API_KEY which comes from // the Pinecone dashboard at https://app.pinecone.io const pinecone = new Pinecone(); diff --git a/examples/src/indexes/vector_stores/pinecone/mmr.ts b/examples/src/indexes/vector_stores/pinecone/mmr.ts index 10a85253b3ea..82aafc797b20 100644 --- a/examples/src/indexes/vector_stores/pinecone/mmr.ts +++ b/examples/src/indexes/vector_stores/pinecone/mmr.ts @@ -4,7 +4,7 @@ import { OpenAIEmbeddings } from "@langchain/openai"; import { PineconeStore } from "@langchain/pinecone"; // Instantiate a new Pinecone client, which will automatically read the -// env vars: PINECONE_API_KEY and PINECONE_ENVIRONMENT which come from +// env vars: PINECONE_API_KEY which comes from // the Pinecone dashboard at https://app.pinecone.io const pinecone = new Pinecone(); diff --git a/examples/src/indexes/vector_stores/pinecone/query_docs.ts b/examples/src/indexes/vector_stores/pinecone/query_docs.ts index 6294067724ec..1df62db7f1ef 100644 --- a/examples/src/indexes/vector_stores/pinecone/query_docs.ts +++ b/examples/src/indexes/vector_stores/pinecone/query_docs.ts @@ -4,7 +4,7 @@ import { OpenAIEmbeddings } from "@langchain/openai"; import { PineconeStore } from "@langchain/pinecone"; // Instantiate a new Pinecone client, which will automatically read the -// env vars: PINECONE_API_KEY and PINECONE_ENVIRONMENT which come from +// env vars: PINECONE_API_KEY which comes from // the Pinecone dashboard at https://app.pinecone.io const pinecone = new Pinecone(); diff --git a/examples/src/retrievers/pinecone_self_query.ts b/examples/src/retrievers/pinecone_self_query.ts index 2ae7116c44f1..66d7dfafdc96 100644 --- a/examples/src/retrievers/pinecone_self_query.ts +++ b/examples/src/retrievers/pinecone_self_query.ts @@ -83,14 +83,8 @@ const attributeInfo: AttributeInfo[] = [ * Next, we instantiate a vector store. This is where we store the embeddings of the documents. * We also need to provide an embeddings object. This is used to embed the documents. */ -if ( - !process.env.PINECONE_API_KEY || - !process.env.PINECONE_ENVIRONMENT || - !process.env.PINECONE_INDEX -) { - throw new Error( - "PINECONE_ENVIRONMENT and PINECONE_API_KEY and PINECONE_INDEX must be set" - ); +if (!process.env.PINECONE_API_KEY || !process.env.PINECONE_INDEX) { + throw new Error("PINECONE_API_KEY and PINECONE_INDEX must be set"); } const pinecone = new Pinecone(); diff --git a/libs/langchain-community/package.json b/libs/langchain-community/package.json index 61ee6c6410f9..8de0a386cb34 100644 --- a/libs/langchain-community/package.json +++ b/libs/langchain-community/package.json @@ -91,7 +91,6 @@ "@neondatabase/serverless": "^0.9.1", "@notionhq/client": "^2.2.10", "@opensearch-project/opensearch": "^2.2.0", - "@pinecone-database/pinecone": "^1.1.0", "@planetscale/database": "^1.8.0", "@premai/prem-sdk": "^0.3.25", "@qdrant/js-client-rest": "^1.8.2", diff --git a/libs/langchain-pinecone/jest.config.cjs b/libs/langchain-pinecone/jest.config.cjs index a49a8832a349..daf6c5865cbe 100644 --- a/libs/langchain-pinecone/jest.config.cjs +++ b/libs/langchain-pinecone/jest.config.cjs @@ -17,5 +17,5 @@ module.exports = { setupFiles: ["dotenv/config"], testTimeout: 20_000, passWithNoTests: true, - collectCoverageFrom: ["src/**/*.ts"] -}; + collectCoverageFrom: ["src/**/*.ts"], + }; diff --git a/libs/langchain-pinecone/package.json b/libs/langchain-pinecone/package.json index 2f6e4cef205f..106855d9ffd2 100644 --- a/libs/langchain-pinecone/package.json +++ b/libs/langchain-pinecone/package.json @@ -32,7 +32,7 @@ "author": "Pinecone, Inc", "license": "MIT", "dependencies": { - "@pinecone-database/pinecone": "^3.0.0 || ^4.0.0", + "@pinecone-database/pinecone": "^4.0.0", "flat": "^5.0.2", "uuid": "^10.0.0" }, diff --git a/libs/langchain-pinecone/src/client.ts b/libs/langchain-pinecone/src/client.ts new file mode 100644 index 000000000000..f1eabb825db5 --- /dev/null +++ b/libs/langchain-pinecone/src/client.ts @@ -0,0 +1,16 @@ +import { Pinecone, PineconeConfiguration } from "@pinecone-database/pinecone"; +import { getEnvironmentVariable } from "@langchain/core/utils/env"; + +export function getPineconeClient(config?: PineconeConfiguration): Pinecone { + if ( + getEnvironmentVariable("PINECONE_API_KEY") === undefined || + getEnvironmentVariable("PINECONE_API_KEY") === "" + ) { + throw new Error("PINECONE_API_KEY must be set in environment"); + } + if (!config) { + return new Pinecone(); + } else { + return new Pinecone(config); + } +} diff --git a/libs/langchain-pinecone/src/embeddings.ts b/libs/langchain-pinecone/src/embeddings.ts new file mode 100644 index 000000000000..6d0d04226ed8 --- /dev/null +++ b/libs/langchain-pinecone/src/embeddings.ts @@ -0,0 +1,139 @@ +/* eslint-disable arrow-body-style */ + +import { Embeddings, type EmbeddingsParams } from "@langchain/core/embeddings"; +import { + EmbeddingsList, + Pinecone, + PineconeConfiguration, +} from "@pinecone-database/pinecone"; +import { getPineconeClient } from "./client.js"; + +/* PineconeEmbeddingsParams holds the optional fields a user can pass to a Pinecone embedding model. + * @param model - Model to use to generate embeddings. Default is "multilingual-e5-large". + * @param params - Additional parameters to pass to the embedding model. Note: parameters are model-specific. Read + * more about model-specific parameters in the [Pinecone + * documentation](https://docs.pinecone.io/guides/inference/understanding-inference#model-specific-parameters). + * */ +export interface PineconeEmbeddingsParams extends EmbeddingsParams { + model?: string; // Model to use to generate embeddings + params?: Record; // Additional parameters to pass to the embedding model +} + +/* PineconeEmbeddings generates embeddings using the Pinecone Inference API. */ +export class PineconeEmbeddings + extends Embeddings + implements PineconeEmbeddingsParams +{ + client: Pinecone; + + model: string; + + params: Record; + + constructor( + fields?: Partial & Partial + ) { + const defaultFields = { maxRetries: 3, ...fields }; + super(defaultFields); + + if (defaultFields.apiKey) { + const config = { + apiKey: defaultFields.apiKey, + controllerHostUrl: defaultFields.controllerHostUrl, + fetchApi: defaultFields.fetchApi, + additionalHeaders: defaultFields.additionalHeaders, + sourceTag: defaultFields.sourceTag, + } as PineconeConfiguration; + this.client = getPineconeClient(config); + } else { + this.client = getPineconeClient(); + } + + if (!defaultFields.model) { + this.model = "multilingual-e5-large"; + } else { + this.model = defaultFields.model; + } + + const defaultParams = { inputType: "passage" }; + + if (defaultFields.params) { + this.params = { ...defaultFields.params, ...defaultParams }; + } else { + this.params = defaultParams; + } + } + + /* Generate embeddings for a list of input strings using a specified embedding model. + * + * @param texts - List of input strings for which to generate embeddings. + * */ + async embedDocuments(texts: string[]): Promise { + if (texts.length === 0) { + throw new Error( + "At least one document is required to generate embeddings" + ); + } + + let embeddings; + if (this.params) { + embeddings = await this.caller.call(async () => { + const result: EmbeddingsList = await this.client.inference.embed( + this.model, + texts, + this.params + ); + return result; + }); + } else { + embeddings = await this.caller.call(async () => { + const result: EmbeddingsList = await this.client.inference.embed( + this.model, + texts, + {} + ); + return result; + }); + } + + const embeddingsList: number[][] = []; + + for (let i = 0; i < embeddings.length; i += 1) { + if (embeddings[i].values) { + embeddingsList.push(embeddings[i].values as number[]); + } + } + return embeddingsList; + } + + /* Generate embeddings for a given query string using a specified embedding model. + * @param text - Query string for which to generate embeddings. + * */ + async embedQuery(text: string): Promise { + // Change inputType to query-specific param for multilingual-e5-large embedding model + this.params.inputType = "query"; + + if (!text) { + throw new Error("No query passed for which to generate embeddings"); + } + let embeddings: EmbeddingsList; + if (this.params) { + embeddings = await this.caller.call(async () => { + return await this.client.inference.embed( + this.model, + [text], + this.params + ); + }); + } else { + embeddings = await this.caller.call(async () => { + return await this.client.inference.embed(this.model, [text], {}); + }); + } + if (embeddings[0].values) { + return embeddings[0].values as number[]; + } else { + return []; + } + } +} diff --git a/libs/langchain-pinecone/src/index.ts b/libs/langchain-pinecone/src/index.ts index d464092ede5e..f3a0f54563c6 100644 --- a/libs/langchain-pinecone/src/index.ts +++ b/libs/langchain-pinecone/src/index.ts @@ -1,2 +1,3 @@ export * from "./vectorstores.js"; export * from "./translator.js"; +export * from "./embeddings.js"; diff --git a/libs/langchain-pinecone/src/tests/client.int.test.ts b/libs/langchain-pinecone/src/tests/client.int.test.ts new file mode 100644 index 000000000000..b52beb2cf2b8 --- /dev/null +++ b/libs/langchain-pinecone/src/tests/client.int.test.ts @@ -0,0 +1,39 @@ +import { Pinecone } from "@pinecone-database/pinecone"; +import { getPineconeClient } from "../client.js"; + +describe("Tests for getPineconeClient", () => { + test("Happy path for getPineconeClient with and without `config` obj passed", async () => { + const client = getPineconeClient(); + expect(client).toBeInstanceOf(Pinecone); + expect(client).toHaveProperty("config"); // Config is always set to *at least* the user's api key + + const clientWithConfig = getPineconeClient({ + // eslint-disable-next-line no-process-env + apiKey: process.env.PINECONE_API_KEY!, + additionalHeaders: { header: "value" }, + }); + expect(clientWithConfig).toBeInstanceOf(Pinecone); + expect(client).toHaveProperty("config"); // Unfortunately cannot assert on contents of config b/c it's a private + // attribute of the Pinecone class + }); + + test("Unhappy path: expect getPineconeClient to throw error if reset PINECONE_API_KEY to empty string", async () => { + // eslint-disable-next-line no-process-env + const originalApiKey = process.env.PINECONE_API_KEY; + try { + // eslint-disable-next-line no-process-env + process.env.PINECONE_API_KEY = ""; + const errorThrown = async () => { + getPineconeClient(); + }; + await expect(errorThrown).rejects.toThrow(Error); + await expect(errorThrown).rejects.toThrow( + "PINECONE_API_KEY must be set in environment" + ); + } finally { + // Restore the original value of PINECONE_API_KEY + // eslint-disable-next-line no-process-env + process.env.PINECONE_API_KEY = originalApiKey; + } + }); +}); diff --git a/libs/langchain-pinecone/src/tests/client.test.ts b/libs/langchain-pinecone/src/tests/client.test.ts new file mode 100644 index 000000000000..60b7f6dcb348 --- /dev/null +++ b/libs/langchain-pinecone/src/tests/client.test.ts @@ -0,0 +1,15 @@ +import { getPineconeClient } from "../client.js"; + +describe("Tests for getPineconeClient", () => { + test("Confirm getPineconeClient throws error when PINECONE_API_KEY is not set", async () => { + /* eslint-disable-next-line no-process-env */ + process.env.PINECONE_API_KEY = ""; + const errorThrown = async () => { + getPineconeClient(); + }; + await expect(errorThrown).rejects.toThrow(Error); + await expect(errorThrown).rejects.toThrow( + "PINECONE_API_KEY must be set in environment" + ); + }); +}); diff --git a/libs/langchain-pinecone/src/tests/embeddings.int.test.ts b/libs/langchain-pinecone/src/tests/embeddings.int.test.ts new file mode 100644 index 000000000000..980e32775bca --- /dev/null +++ b/libs/langchain-pinecone/src/tests/embeddings.int.test.ts @@ -0,0 +1,59 @@ +import { PineconeEmbeddings } from "../embeddings.js"; + +describe("Integration tests for Pinecone embeddings", () => { + test("Happy path: defaults for both embedDocuments and embedQuery", async () => { + const model = new PineconeEmbeddings(); + expect(model.model).toBe("multilingual-e5-large"); + expect(model.params).toEqual({ inputType: "passage" }); + + const docs = ["hello", "world"]; + const embeddings = await model.embedDocuments(docs); + expect(embeddings.length).toBe(docs.length); + + const query = "hello"; + const queryEmbedding = await model.embedQuery(query); + expect(queryEmbedding.length).toBeGreaterThan(0); + }); + + test("Happy path: custom `params` obj passed to embedDocuments and embedQuery", async () => { + const model = new PineconeEmbeddings({ + params: { customParam: "value" }, + }); + expect(model.model).toBe("multilingual-e5-large"); + expect(model.params).toEqual({ + inputType: "passage", + customParam: "value", + }); + + const docs = ["hello", "world"]; + const embeddings = await model.embedDocuments(docs); + expect(embeddings.length).toBe(docs.length); + expect(embeddings[0].length).toBe(1024); // Assert correct dims on random doc + expect(model.model).toBe("multilingual-e5-large"); + expect(model.params).toEqual({ + inputType: "passage", // Maintain default inputType for docs + customParam: "value", + }); + + const query = "hello"; + const queryEmbedding = await model.embedQuery(query); + expect(model.model).toBe("multilingual-e5-large"); + expect(queryEmbedding.length).toBe(1024); + expect(model.params).toEqual({ + inputType: "query", // Change inputType for query + customParam: "value", + }); + }); + + test("Unhappy path: embedDocuments and embedQuery throw when empty objs are passed", async () => { + const model = new PineconeEmbeddings(); + await expect(model.embedDocuments([])).rejects.toThrow(); + await expect(model.embedQuery("")).rejects.toThrow(); + }); + + test("Unhappy path: PineconeEmbeddings throws when invalid model is passed", async () => { + const model = new PineconeEmbeddings({ model: "invalid-model" }); + await expect(model.embedDocuments([])).rejects.toThrow(); + await expect(model.embedQuery("")).rejects.toThrow(); + }); +}); diff --git a/libs/langchain-pinecone/src/tests/embeddings.test.ts b/libs/langchain-pinecone/src/tests/embeddings.test.ts new file mode 100644 index 000000000000..5b203e758a5b --- /dev/null +++ b/libs/langchain-pinecone/src/tests/embeddings.test.ts @@ -0,0 +1,48 @@ +import { PineconeEmbeddings } from "../embeddings.js"; + +beforeAll(() => { + // eslint-disable-next-line no-process-env + process.env.PINECONE_API_KEY = "test-api-key"; +}); + +describe("Tests for the PineconeEmbeddings class", () => { + test("Confirm embedDocuments method throws error when an empty array is passed", async () => { + const model = new PineconeEmbeddings(); + const errorThrown = async () => { + await model.embedDocuments([]); + }; + await expect(errorThrown).rejects.toThrow(Error); + await expect(errorThrown).rejects.toThrowError( + "At least one document is required to generate embeddings" + ); + }); + + test("Confirm embedQuery method throws error when an empty string is passed", async () => { + const model = new PineconeEmbeddings(); + const errorThrown = async () => { + await model.embedQuery(""); + }; + await expect(errorThrown).rejects.toThrow(Error); + await expect(errorThrown).rejects.toThrowError( + "No query passed for which to generate embeddings" + ); + }); + + test("Confirm instance defaults are set when no args are passed", async () => { + const model = new PineconeEmbeddings(); + expect(model.model).toBe("multilingual-e5-large"); + expect(model.params).toEqual({ inputType: "passage" }); + }); + + test("Confirm instance sets custom model and params when provided", () => { + const customModel = new PineconeEmbeddings({ + model: "custom-model", + params: { customParam: "value" }, + }); + expect(customModel.model).toBe("custom-model"); + expect(customModel.params).toEqual({ + inputType: "passage", + customParam: "value", + }); + }); +}); diff --git a/libs/langchain-pinecone/src/tests/translator.int.test.ts b/libs/langchain-pinecone/src/tests/translator.int.test.ts index 69a2b8fca663..efae21d505e0 100644 --- a/libs/langchain-pinecone/src/tests/translator.int.test.ts +++ b/libs/langchain-pinecone/src/tests/translator.int.test.ts @@ -109,14 +109,8 @@ describe("Pinecone self query", () => { }, ]; - if ( - !process.env.PINECONE_API_KEY || - !process.env.PINECONE_ENVIRONMENT || - !testIndexName - ) { - throw new Error( - "PINECONE_ENVIRONMENT and PINECONE_API_KEY and PINECONE_INDEX must be set" - ); + if (!process.env.PINECONE_API_KEY || !testIndexName) { + throw new Error("PINECONE_API_KEY and PINECONE_INDEX must be set"); } const embeddings = new OpenAIEmbeddings(); @@ -268,14 +262,8 @@ describe("Pinecone self query", () => { }, ]; - if ( - !process.env.PINECONE_API_KEY || - !process.env.PINECONE_ENVIRONMENT || - !testIndexName - ) { - throw new Error( - "PINECONE_ENVIRONMENT and PINECONE_API_KEY and PINECONE_INDEX must be set" - ); + if (!process.env.PINECONE_API_KEY || !testIndexName) { + throw new Error("PINECONE_API_KEY and PINECONE_INDEX must be set"); } const embeddings = new OpenAIEmbeddings(); diff --git a/libs/langchain-pinecone/src/vectorstores.ts b/libs/langchain-pinecone/src/vectorstores.ts index 3e4db8bbc30c..8601c1570e8f 100644 --- a/libs/langchain-pinecone/src/vectorstores.ts +++ b/libs/langchain-pinecone/src/vectorstores.ts @@ -88,7 +88,7 @@ export type PineconeDeleteParams = { * * const pinecone = new PineconeClient(); * - * // Will automatically read the PINECONE_API_KEY and PINECONE_ENVIRONMENT env vars + * // Will automatically read the PINECONE_API_KEY env var * const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX!); * * const embeddings = new OpenAIEmbeddings({ diff --git a/yarn.lock b/yarn.lock index 42dd445ba3fe..ee400067548a 100644 --- a/yarn.lock +++ b/yarn.lock @@ -9377,22 +9377,6 @@ __metadata: languageName: node linkType: hard -"@edge-runtime/primitives@npm:4.0.2": - version: 4.0.2 - resolution: "@edge-runtime/primitives@npm:4.0.2" - checksum: 66ccdadb8bcb02bf7d75d75bc6a611c2d515b09818b8cc4f524944aba58b735aff8a76ba52981a04f89fe616f801d1d947a9b1637a201a3b32f6eca74a6f7001 - languageName: node - linkType: hard - -"@edge-runtime/types@npm:^2.2.3": - version: 2.2.4 - resolution: "@edge-runtime/types@npm:2.2.4" - dependencies: - "@edge-runtime/primitives": 4.0.2 - checksum: 51b5b1cf8f462dd77038357e1b72737e0e8f37c6e7aa47e0b9cd3aebc81004ad5e41c697d14a66171e5acfddf8bf5a5fc6e76c01dde60daccb5703310a4c4446 - languageName: node - linkType: hard - "@elastic/elasticsearch@npm:^8.4.0": version: 8.8.1 resolution: "@elastic/elasticsearch@npm:8.8.1" @@ -11530,7 +11514,6 @@ __metadata: "@neondatabase/serverless": ^0.9.1 "@notionhq/client": ^2.2.10 "@opensearch-project/opensearch": ^2.2.0 - "@pinecone-database/pinecone": ^1.1.0 "@planetscale/database": ^1.8.0 "@premai/prem-sdk": ^0.3.25 "@qdrant/js-client-rest": ^1.8.2 @@ -12608,7 +12591,7 @@ __metadata: "@langchain/core": "workspace:*" "@langchain/openai": "workspace:*" "@langchain/scripts": ">=0.1.0 <0.2.0" - "@pinecone-database/pinecone": ^3.0.0 || ^4.0.0 + "@pinecone-database/pinecone": ^4.0.0 "@swc/core": ^1.3.90 "@swc/jest": ^0.2.29 "@tsconfig/recommended": ^1.0.3 @@ -14130,22 +14113,7 @@ __metadata: languageName: node linkType: hard -"@pinecone-database/pinecone@npm:^1.1.0": - version: 1.1.0 - resolution: "@pinecone-database/pinecone@npm:1.1.0" - dependencies: - "@edge-runtime/types": ^2.2.3 - "@sinclair/typebox": ^0.29.0 - "@types/node": ^18.11.17 - ajv: ^8.12.0 - cross-fetch: ^3.1.5 - encoding: ^0.1.13 - typescript: ^4.9.4 - checksum: 5ff066c23006c6ad8eb3ae5add5aa2633adacf48c539a498f6d97798fb9640254a2ede99f7e47aaff3d885e3df5168ed193990d846acd9bebded7140f18c8eb1 - languageName: node - linkType: hard - -"@pinecone-database/pinecone@npm:^3.0.0 || ^4.0.0, @pinecone-database/pinecone@npm:^4.0.0": +"@pinecone-database/pinecone@npm:^4.0.0": version: 4.0.0 resolution: "@pinecone-database/pinecone@npm:4.0.0" dependencies: @@ -14822,13 +14790,6 @@ __metadata: languageName: node linkType: hard -"@sinclair/typebox@npm:^0.29.0": - version: 0.29.6 - resolution: "@sinclair/typebox@npm:0.29.6" - checksum: 02c2ee9e8bcb4e03a2497fc208a5a2ae4f4991347e6f2b2a217027b33eb8ca2bc74e80bfb2654da537453e9e5c25d59dbad2d32d4fe47ffec357330bf343bf7a - languageName: node - linkType: hard - "@sindresorhus/is@npm:^0.14.0": version: 0.14.0 resolution: "@sindresorhus/is@npm:0.14.0" @@ -19363,13 +19324,6 @@ __metadata: languageName: node linkType: hard -"@types/node@npm:^18.11.17": - version: 18.18.4 - resolution: "@types/node@npm:18.18.4" - checksum: 4901e91c4cc479bb58acbcd79236a97a0ad6db4a53cb1f4ba4cf32af15324c61b16faa6e31c1b09bf538a20feb5f5274239157ce5237f5741db0b9ab71e69c52 - languageName: node - linkType: hard - "@types/node@npm:^18.11.18": version: 18.16.19 resolution: "@types/node@npm:18.16.19" @@ -20958,7 +20912,7 @@ __metadata: languageName: node linkType: hard -"ajv@npm:^8.0.0, ajv@npm:^8.12.0, ajv@npm:^8.6.3, ajv@npm:^8.9.0": +"ajv@npm:^8.0.0, ajv@npm:^8.6.3, ajv@npm:^8.9.0": version: 8.12.0 resolution: "ajv@npm:8.12.0" dependencies: