From 6352edf77fe2ef5f412201562426fbbdf5c6bfeb Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Mon, 23 Dec 2024 16:55:44 +0100 Subject: [PATCH] docs: CrateDB: Register package `langchain-cratedb`, and add minimal "provider" documentation (#28877) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi Erick. Coming back from a previous attempt, we now made a separate package for the CrateDB adapter, called `langchain-cratedb`, as advised. Other than registering the package within `libs/packages.yml`, this patch includes a minimal amount of documentation to accompany the advent of this new package. Let us know about any mistakes we made, or changes you would like to see. Thanks, Andreas. ## About - **Description:** Register a new database adapter package, `langchain-cratedb`, providing traditional vector store, document loader, and chat message history features for a start. - **Addressed to:** @efriis, @eyurtsev - **References:** GH-27710 - **Preview:** [Providers » More » CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/) ## Status - **PyPI:** https://pypi.org/project/langchain-cratedb/ - **GitHub:** https://github.com/crate/langchain-cratedb - **Documentation (CrateDB):** https://cratedb.com/docs/guide/integrate/langchain/ - **Documentation (LangChain):** _This PR._ ## Backlog? Is this applicable for this kind of patch? > - [ ] **Add tests and docs**: If you're adding a new integration, please include > 1. a test for the integration, preferably unit tests that do not rely on network access, > 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. ## Q&A 1. Notebooks that use the LangChain CrateDB adapter are currently at [CrateDB LangChain Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain), and the documentation refers to them. Because they are derived from very old blueprints coming from LangChain 0.0.x times, we guess they need a refresh before adding them to `docs/docs/integrations`. Is it applicable to merge this minimal package registration + documentation patch, which already includes valid code snippets in `cratedb.mdx`, and add corresponding notebooks on behalf of a subsequent patch later? 2. How would it work getting into the tabular list of _Integration Packages_ enumerated on the [documentation entrypoint page about Providers](https://python.langchain.com/docs/integrations/providers/)? /cc Please also review, @ckurze, @wierdvanderhaar, @kneth, @simonprickett, if you can find the time. Thanks! --- docs/docs/integrations/providers/cratedb.mdx | 132 +++++++++++++++++++ libs/packages.yml | 3 + 2 files changed, 135 insertions(+) create mode 100644 docs/docs/integrations/providers/cratedb.mdx diff --git a/docs/docs/integrations/providers/cratedb.mdx b/docs/docs/integrations/providers/cratedb.mdx new file mode 100644 index 0000000000000..24e47930407c0 --- /dev/null +++ b/docs/docs/integrations/providers/cratedb.mdx @@ -0,0 +1,132 @@ +# CrateDB + +> [CrateDB] is a distributed and scalable SQL database for storing and +> analyzing massive amounts of data in near real-time, even with complex +> queries. It is PostgreSQL-compatible, based on Lucene, and inheriting +> from Elasticsearch. + + +## Installation and Setup + +### Setup CrateDB +There are two ways to get started with CrateDB quickly. Alternatively, +choose other [CrateDB installation options]. + +#### Start CrateDB on your local machine +Example: Run a single-node CrateDB instance with security disabled, +using Docker or Podman. This is not recommended for production use. + +```bash +docker run --name=cratedb --rm \ + --publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \ + crate:latest -Cdiscovery.type=single-node +``` + +#### Deploy cluster on CrateDB Cloud +[CrateDB Cloud] is a managed CrateDB service. Sign up for a +[free trial][CrateDB Cloud Console]. + +### Install Client +Install the most recent version of the `langchain-cratedb` package +and a few others that are needed for this tutorial. +```bash +pip install --upgrade langchain-cratedb langchain-openai unstructured +``` + + +## Documentation +For a more detailed walkthrough of the CrateDB wrapper, see +[using LangChain with CrateDB]. See also [all features of CrateDB] +to learn about other functionality provided by CrateDB. + + +## Features +The CrateDB adapter for LangChain provides APIs to use CrateDB as vector store, +document loader, and storage for chat messages. + +### Vector Store +Use the CrateDB vector store functionality around `FLOAT_VECTOR` and `KNN_MATCH` +for similarity search and other purposes. See also [CrateDBVectorStore Tutorial]. + +Make sure you've configured a valid OpenAI API key. +```bash +export OPENAI_API_KEY=sk-XJZ... +``` +```python +from langchain_community.document_loaders import UnstructuredURLLoader +from langchain_cratedb import CrateDBVectorStore +from langchain_openai import OpenAIEmbeddings +from langchain.text_splitter import CharacterTextSplitter + +loader = UnstructuredURLLoader(urls=["https://github.com/langchain-ai/langchain/raw/refs/tags/langchain-core==0.3.28/docs/docs/how_to/state_of_the_union.txt"]) +documents = loader.load() +text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) +docs = text_splitter.split_documents(documents) + +embeddings = OpenAIEmbeddings() + +# Connect to a self-managed CrateDB instance on localhost. +CONNECTION_STRING = "crate://?schema=testdrive" + +store = CrateDBVectorStore.from_documents( + documents=docs, + embedding=embeddings, + collection_name="state_of_the_union", + connection=CONNECTION_STRING, +) + +query = "What did the president say about Ketanji Brown Jackson" +docs_with_score = store.similarity_search_with_score(query) +``` + +### Document Loader +Load load documents from a CrateDB database table, using the document loader +`CrateDBLoader`, which is based on SQLAlchemy. See also [CrateDBLoader Tutorial]. + +To use the document loader in your applications: +```python +import sqlalchemy as sa +from langchain_community.utilities import SQLDatabase +from langchain_cratedb import CrateDBLoader + +# Connect to a self-managed CrateDB instance on localhost. +CONNECTION_STRING = "crate://?schema=testdrive" + +db = SQLDatabase(engine=sa.create_engine(CONNECTION_STRING)) + +loader = CrateDBLoader( + 'SELECT * FROM sys.summits LIMIT 42', + db=db, +) +documents = loader.load() +``` + +### Chat Message History +Use CrateDB as the storage for your chat messages. +See also [CrateDBChatMessageHistory Tutorial]. + +To use the chat message history in your applications: +```python +from langchain_cratedb import CrateDBChatMessageHistory + +# Connect to a self-managed CrateDB instance on localhost. +CONNECTION_STRING = "crate://?schema=testdrive" + +message_history = CrateDBChatMessageHistory( + session_id="test-session", + connection=CONNECTION_STRING, +) + +message_history.add_user_message("hi!") +``` + + +[all features of CrateDB]: https://cratedb.com/docs/guide/feature/ +[CrateDB]: https://cratedb.com/database +[CrateDB Cloud]: https://cratedb.com/database/cloud +[CrateDB Cloud Console]: https://console.cratedb.cloud/?utm_source=langchain&utm_content=documentation +[CrateDB installation options]: https://cratedb.com/docs/guide/install/ +[CrateDBChatMessageHistory Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb +[CrateDBLoader Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb +[CrateDBVectorStore Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb +[using LangChain with CrateDB]: https://cratedb.com/docs/guide/integrate/langchain/ diff --git a/libs/packages.yml b/libs/packages.yml index da26ed6f0cfb8..e9f64be5a5eaa 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -143,6 +143,9 @@ packages: - name: langchain-couchbase repo: langchain-ai/langchain path: libs/partners/couchbase + - name: langchain-cratedb + repo: crate/langchain-cratedb + path: . - name: langchain-ollama repo: langchain-ai/langchain path: libs/partners/ollama