Skip to content

Commit

Permalink
docs: CrateDB: Register package langchain-cratedb, and add minimal …
Browse files Browse the repository at this point in the history
…"provider" documentation (#28877)

Hi Erick. Coming back from a previous attempt, we now made a separate
package for the CrateDB adapter, called `langchain-cratedb`, as advised.
Other than registering the package within `libs/packages.yml`, this
patch includes a minimal amount of documentation to accompany the advent
of this new package. Let us know about any mistakes we made, or changes
you would like to see. Thanks, Andreas.

## About
- **Description:** Register a new database adapter package,
`langchain-cratedb`, providing traditional vector store, document
loader, and chat message history features for a start.
- **Addressed to:** @efriis, @eyurtsev
- **References:** GH-27710
- **Preview:** [Providers » More »
CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/)

## Status
- **PyPI:** https://pypi.org/project/langchain-cratedb/
- **GitHub:** https://github.com/crate/langchain-cratedb
- **Documentation (CrateDB):**
https://cratedb.com/docs/guide/integrate/langchain/
- **Documentation (LangChain):** _This PR._

## Backlog?
Is this applicable for this kind of patch?
> - [ ] **Add tests and docs**: If you're adding a new integration,
please include
> 1. a test for the integration, preferably unit tests that do not rely
on network access,
> 2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

## Q&A
1. Notebooks that use the LangChain CrateDB adapter are currently at
[CrateDB LangChain
Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain),
and the documentation refers to them. Because they are derived from very
old blueprints coming from LangChain 0.0.x times, we guess they need a
refresh before adding them to `docs/docs/integrations`. Is it applicable
to merge this minimal package registration + documentation patch, which
already includes valid code snippets in `cratedb.mdx`, and add
corresponding notebooks on behalf of a subsequent patch later?

2. How would it work getting into the tabular list of _Integration
Packages_ enumerated on the [documentation entrypoint page about
Providers](https://python.langchain.com/docs/integrations/providers/)?

/cc Please also review, @ckurze, @wierdvanderhaar, @kneth,
@simonprickett, if you can find the time. Thanks!
  • Loading branch information
amotl authored Dec 23, 2024
1 parent e5c9da3 commit 6352edf
Show file tree
Hide file tree
Showing 2 changed files with 135 additions and 0 deletions.
132 changes: 132 additions & 0 deletions docs/docs/integrations/providers/cratedb.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# CrateDB

> [CrateDB] is a distributed and scalable SQL database for storing and
> analyzing massive amounts of data in near real-time, even with complex
> queries. It is PostgreSQL-compatible, based on Lucene, and inheriting
> from Elasticsearch.

## Installation and Setup

### Setup CrateDB
There are two ways to get started with CrateDB quickly. Alternatively,
choose other [CrateDB installation options].

#### Start CrateDB on your local machine
Example: Run a single-node CrateDB instance with security disabled,
using Docker or Podman. This is not recommended for production use.

```bash
docker run --name=cratedb --rm \
--publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
crate:latest -Cdiscovery.type=single-node
```

#### Deploy cluster on CrateDB Cloud
[CrateDB Cloud] is a managed CrateDB service. Sign up for a
[free trial][CrateDB Cloud Console].

### Install Client
Install the most recent version of the `langchain-cratedb` package
and a few others that are needed for this tutorial.
```bash
pip install --upgrade langchain-cratedb langchain-openai unstructured
```


## Documentation
For a more detailed walkthrough of the CrateDB wrapper, see
[using LangChain with CrateDB]. See also [all features of CrateDB]
to learn about other functionality provided by CrateDB.


## Features
The CrateDB adapter for LangChain provides APIs to use CrateDB as vector store,
document loader, and storage for chat messages.

### Vector Store
Use the CrateDB vector store functionality around `FLOAT_VECTOR` and `KNN_MATCH`
for similarity search and other purposes. See also [CrateDBVectorStore Tutorial].

Make sure you've configured a valid OpenAI API key.
```bash
export OPENAI_API_KEY=sk-XJZ...
```
```python
from langchain_community.document_loaders import UnstructuredURLLoader
from langchain_cratedb import CrateDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter

loader = UnstructuredURLLoader(urls=["https://github.com/langchain-ai/langchain/raw/refs/tags/langchain-core==0.3.28/docs/docs/how_to/state_of_the_union.txt"])
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"

store = CrateDBVectorStore.from_documents(
documents=docs,
embedding=embeddings,
collection_name="state_of_the_union",
connection=CONNECTION_STRING,
)

query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = store.similarity_search_with_score(query)
```

### Document Loader
Load load documents from a CrateDB database table, using the document loader
`CrateDBLoader`, which is based on SQLAlchemy. See also [CrateDBLoader Tutorial].

To use the document loader in your applications:
```python
import sqlalchemy as sa
from langchain_community.utilities import SQLDatabase
from langchain_cratedb import CrateDBLoader

# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"

db = SQLDatabase(engine=sa.create_engine(CONNECTION_STRING))

loader = CrateDBLoader(
'SELECT * FROM sys.summits LIMIT 42',
db=db,
)
documents = loader.load()
```

### Chat Message History
Use CrateDB as the storage for your chat messages.
See also [CrateDBChatMessageHistory Tutorial].

To use the chat message history in your applications:
```python
from langchain_cratedb import CrateDBChatMessageHistory

# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"

message_history = CrateDBChatMessageHistory(
session_id="test-session",
connection=CONNECTION_STRING,
)

message_history.add_user_message("hi!")
```


[all features of CrateDB]: https://cratedb.com/docs/guide/feature/
[CrateDB]: https://cratedb.com/database
[CrateDB Cloud]: https://cratedb.com/database/cloud
[CrateDB Cloud Console]: https://console.cratedb.cloud/?utm_source=langchain&utm_content=documentation
[CrateDB installation options]: https://cratedb.com/docs/guide/install/
[CrateDBChatMessageHistory Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb
[CrateDBLoader Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb
[CrateDBVectorStore Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[using LangChain with CrateDB]: https://cratedb.com/docs/guide/integrate/langchain/
3 changes: 3 additions & 0 deletions libs/packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ packages:
- name: langchain-couchbase
repo: langchain-ai/langchain
path: libs/partners/couchbase
- name: langchain-cratedb
repo: crate/langchain-cratedb
path: .
- name: langchain-ollama
repo: langchain-ai/langchain
path: libs/partners/ollama
Expand Down

0 comments on commit 6352edf

Please sign in to comment.