Skip to content

Commit

Permalink
Merge branch 'master' into patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
AhmedTammaa authored Dec 23, 2024
2 parents 5637dc7 + 6352edf commit 4aaa912
Show file tree
Hide file tree
Showing 12 changed files with 379 additions and 37 deletions.
46 changes: 43 additions & 3 deletions docs/docs/integrations/chat/mlx.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -155,8 +155,48 @@
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
"\n",
"# setup ReAct style prompt\n",
"prompt = hub.pull(\"hwchase17/react-json\")\n",
"prompt = prompt.partial(\n",
"# Based on 'hwchase17/react' prompt modification, cause mlx does not support the `System` role\n",
"human_prompt = \"\"\"\n",
"Answer the following questions as best you can. You have access to the following tools:\n",
"\n",
"{tools}\n",
"\n",
"The way you use the tools is by specifying a json blob.\n",
"Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).\n",
"\n",
"The only values that should be in the \"action\" field are: {tool_names}\n",
"\n",
"The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:\n",
"\n",
"```\n",
"{{\n",
" \"action\": $TOOL_NAME,\n",
" \"action_input\": $INPUT\n",
"}}\n",
"```\n",
"\n",
"ALWAYS use the following format:\n",
"\n",
"Question: the input question you must answer\n",
"Thought: you should always think about what to do\n",
"Action:\n",
"```\n",
"$JSON_BLOB\n",
"```\n",
"Observation: the result of the action\n",
"... (this Thought/Action/Observation can repeat N times)\n",
"Thought: I now know the final answer\n",
"Final Answer: the final answer to the original input question\n",
"\n",
"Begin! Reminder to always use the exact characters `Final Answer` when responding.\n",
"\n",
"{input}\n",
"\n",
"{agent_scratchpad}\n",
"\n",
"\"\"\"\n",
"\n",
"prompt = human_prompt.partial(\n",
" tools=render_text_description(tools),\n",
" tool_names=\", \".join([t.name for t in tools]),\n",
")\n",
Expand Down Expand Up @@ -207,7 +247,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.12.7"
}
},
"nbformat": 4,
Expand Down
132 changes: 132 additions & 0 deletions docs/docs/integrations/providers/cratedb.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# CrateDB

> [CrateDB] is a distributed and scalable SQL database for storing and
> analyzing massive amounts of data in near real-time, even with complex
> queries. It is PostgreSQL-compatible, based on Lucene, and inheriting
> from Elasticsearch.

## Installation and Setup

### Setup CrateDB
There are two ways to get started with CrateDB quickly. Alternatively,
choose other [CrateDB installation options].

#### Start CrateDB on your local machine
Example: Run a single-node CrateDB instance with security disabled,
using Docker or Podman. This is not recommended for production use.

```bash
docker run --name=cratedb --rm \
--publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
crate:latest -Cdiscovery.type=single-node
```

#### Deploy cluster on CrateDB Cloud
[CrateDB Cloud] is a managed CrateDB service. Sign up for a
[free trial][CrateDB Cloud Console].

### Install Client
Install the most recent version of the `langchain-cratedb` package
and a few others that are needed for this tutorial.
```bash
pip install --upgrade langchain-cratedb langchain-openai unstructured
```


## Documentation
For a more detailed walkthrough of the CrateDB wrapper, see
[using LangChain with CrateDB]. See also [all features of CrateDB]
to learn about other functionality provided by CrateDB.


## Features
The CrateDB adapter for LangChain provides APIs to use CrateDB as vector store,
document loader, and storage for chat messages.

### Vector Store
Use the CrateDB vector store functionality around `FLOAT_VECTOR` and `KNN_MATCH`
for similarity search and other purposes. See also [CrateDBVectorStore Tutorial].

Make sure you've configured a valid OpenAI API key.
```bash
export OPENAI_API_KEY=sk-XJZ...
```
```python
from langchain_community.document_loaders import UnstructuredURLLoader
from langchain_cratedb import CrateDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter

loader = UnstructuredURLLoader(urls=["https://github.com/langchain-ai/langchain/raw/refs/tags/langchain-core==0.3.28/docs/docs/how_to/state_of_the_union.txt"])
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"

store = CrateDBVectorStore.from_documents(
documents=docs,
embedding=embeddings,
collection_name="state_of_the_union",
connection=CONNECTION_STRING,
)

query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = store.similarity_search_with_score(query)
```

### Document Loader
Load load documents from a CrateDB database table, using the document loader
`CrateDBLoader`, which is based on SQLAlchemy. See also [CrateDBLoader Tutorial].

To use the document loader in your applications:
```python
import sqlalchemy as sa
from langchain_community.utilities import SQLDatabase
from langchain_cratedb import CrateDBLoader

# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"

db = SQLDatabase(engine=sa.create_engine(CONNECTION_STRING))

loader = CrateDBLoader(
'SELECT * FROM sys.summits LIMIT 42',
db=db,
)
documents = loader.load()
```

### Chat Message History
Use CrateDB as the storage for your chat messages.
See also [CrateDBChatMessageHistory Tutorial].

To use the chat message history in your applications:
```python
from langchain_cratedb import CrateDBChatMessageHistory

# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"

message_history = CrateDBChatMessageHistory(
session_id="test-session",
connection=CONNECTION_STRING,
)

message_history.add_user_message("hi!")
```


[all features of CrateDB]: https://cratedb.com/docs/guide/feature/
[CrateDB]: https://cratedb.com/database
[CrateDB Cloud]: https://cratedb.com/database/cloud
[CrateDB Cloud Console]: https://console.cratedb.cloud/?utm_source=langchain&utm_content=documentation
[CrateDB installation options]: https://cratedb.com/docs/guide/install/
[CrateDBChatMessageHistory Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb
[CrateDBLoader Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb
[CrateDBVectorStore Tutorial]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[using LangChain with CrateDB]: https://cratedb.com/docs/guide/integrate/langchain/
22 changes: 14 additions & 8 deletions docs/scripts/notebook_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,16 +143,22 @@ def _modify_frontmatter(
edit_url = (
f"https://github.com/langchain-ai/langchain/edit/master/docs/docs/{rel_path}"
)
frontmatter = {
"custom_edit_url": edit_url,
}
if re.match(r"^[\s\n]*---\n", body):
# if custom_edit_url already exists, leave it
if re.match(r"custom_edit_url: ", body):
return body
else:
return re.sub(
r"^[\s\n]*---\n", f"---\ncustom_edit_url: {edit_url}\n", body, count=1
)
# frontmatter already present

for k, v in frontmatter.items():
# if key already exists, leave it
if re.match(f"{k}: ", body):
continue
else:
body = re.sub(r"^[\s\n]*---\n", f"---\n{k}: {v}\n", body, count=1)
return body
else:
return f"---\ncustom_edit_url: {edit_url}\n---\n{body}"
insert = "\n".join([f"{k}: {v}" for k, v in frontmatter.items()])
return f"---\n{insert}\n---\n{body}"


def _convert_notebook(
Expand Down
85 changes: 85 additions & 0 deletions docs/src/theme/DocItem/Layout/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
import React from 'react';
import clsx from 'clsx';
import {useWindowSize} from '@docusaurus/theme-common';
import {useDoc} from '@docusaurus/plugin-content-docs/client';
import DocItemPaginator from '@theme/DocItem/Paginator';
import DocVersionBanner from '@theme/DocVersionBanner';
import DocVersionBadge from '@theme/DocVersionBadge';
import DocItemFooter from '@theme/DocItem/Footer';
import DocItemTOCMobile from '@theme/DocItem/TOC/Mobile';
import DocItemTOCDesktop from '@theme/DocItem/TOC/Desktop';
import DocItemContent from '@theme/DocItem/Content';
import DocBreadcrumbs from '@theme/DocBreadcrumbs';
import ContentVisibility from '@theme/ContentVisibility';
import styles from './styles.module.css';
/**
* Decide if the toc should be rendered, on mobile or desktop viewports
*/
function useDocTOC() {
const {frontMatter, toc} = useDoc();
const windowSize = useWindowSize();
const hidden = frontMatter.hide_table_of_contents;
const canRender = !hidden && toc.length > 0;
const mobile = canRender ? <DocItemTOCMobile /> : undefined;
const desktop =
canRender && (windowSize === 'desktop' || windowSize === 'ssr') ? (
<DocItemTOCDesktop />
) : undefined;
return {
hidden,
mobile,
desktop,
};
}
export default function DocItemLayout({children}) {
const docTOC = useDocTOC();
const {metadata, frontMatter} = useDoc();

"https://github.com/langchain-ai/langchain/blob/master/docs/docs/introduction.ipynb"
"https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/introduction.ipynb"

console.log({metadata, frontMatter})

const linkColab = frontMatter.link_colab || (
metadata.editUrl?.endsWith(".ipynb")
? metadata.editUrl?.replace("https://github.com/langchain-ai/langchain/edit/", "https://colab.research.google.com/github/langchain-ai/langchain/blob/")
: null
);
const linkGithub = frontMatter.link_github || metadata.editUrl?.replace("/edit/", "/blob/");

console.log({linkColab, linkGithub})

return (
<div className="row">
<div className={clsx('col', !docTOC.hidden && styles.docItemCol)}>
<ContentVisibility metadata={metadata} />
<DocVersionBanner />
<div className={styles.docItemContainer}>
<article>
<DocBreadcrumbs />
<DocVersionBadge />
{docTOC.mobile}
<div style={{
display: "flex",
flexDirection: "column",
alignItems: "flex-end",
float: "right",
}}>
{linkColab && (<a target="_blank" href={linkColab}>
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>)}
{linkGithub && (<a href={linkGithub} target="_blank">
<img src="https://img.shields.io/badge/Open%20on%20GitHub-grey?logo=github&logoColor=white"
alt="Open on GitHub" />
</a>)}
</div>
<DocItemContent>{children}</DocItemContent>
<DocItemFooter />
</article>
<DocItemPaginator />
</div>
</div>
{docTOC.desktop && <div className="col col--3">{docTOC.desktop}</div>}
</div>
);
}
10 changes: 10 additions & 0 deletions docs/src/theme/DocItem/Layout/styles.module.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.docItemContainer header + *,
.docItemContainer article > *:first-child {
margin-top: 0;
}

@media (min-width: 997px) {
.docItemCol {
max-width: 75% !important;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,8 @@ def lazy_load(self) -> Iterator[Document]:
"""Lazy load records from dataframe."""

for _, row in self.data_frame.iterrows():
text = row[self.page_content_column]
metadata = row.to_dict()
metadata.pop(self.page_content_column)
text = metadata.pop(self.page_content_column)
yield Document(page_content=text, metadata=metadata)


Expand Down
Loading

0 comments on commit 4aaa912

Please sign in to comment.