Skip to content

Commit

Permalink
Merge branch 'rlm/testing' of https://github.com/langchain-ai/data-en…
Browse files Browse the repository at this point in the history
…richment into rlm/testing
  • Loading branch information
rlancemartin committed Sep 17, 2024
2 parents b0e80c4 + 8c62cf7 commit 65fb936
Show file tree
Hide file tree
Showing 6 changed files with 1,548 additions and 106 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,3 +161,4 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.DS_Store
uv.lock
95 changes: 9 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

[![CI](https://github.com/langchain-ai/data-enrichment/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/langchain-ai/data-enrichment/actions/workflows/unit-tests.yml)
[![Integration Tests](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml/badge.svg)](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml)
[![Open in - LangGraph Studio](https://img.shields.io/badge/Open_in-LangGraph_Studio-00324d.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI4NS4zMzMiIGhlaWdodD0iODUuMzMzIiB2ZXJzaW9uPSIxLjAiIHZpZXdCb3g9IjAgMCA2NCA2NCI+PHBhdGggZD0iTTEzIDcuOGMtNi4zIDMuMS03LjEgNi4zLTYuOCAyNS43LjQgMjQuNi4zIDI0LjUgMjUuOSAyNC41QzU3LjUgNTggNTggNTcuNSA1OCAzMi4zIDU4IDcuMyA1Ni43IDYgMzIgNmMtMTIuOCAwLTE2LjEuMy0xOSAxLjhtMzcuNiAxNi42YzIuOCAyLjggMy40IDQuMiAzLjQgNy42cy0uNiA0LjgtMy40IDcuNkw0Ny4yIDQzSDE2LjhsLTMuNC0zLjRjLTQuOC00LjgtNC44LTEwLjQgMC0xNS4ybDMuNC0zLjRoMzAuNHoiLz48cGF0aCBkPSJNMTguOSAyNS42Yy0xLjEgMS4zLTEgMS43LjQgMi41LjkuNiAxLjcgMS44IDEuNyAyLjcgMCAxIC43IDIuOCAxLjYgNC4xIDEuNCAxLjkgMS40IDIuNS4zIDMuMi0xIC42LS42LjkgMS40LjkgMS41IDAgMi43LS41IDIuNy0xIDAtLjYgMS4xLS44IDIuNi0uNGwyLjYuNy0xLjgtMi45Yy01LjktOS4zLTkuNC0xMi4zLTExLjUtOS44TTM5IDI2YzAgMS4xLS45IDIuNS0yIDMuMi0yLjQgMS41LTIuNiAzLjQtLjUgNC4yLjguMyAyIDEuNyAyLjUgMy4xLjYgMS41IDEuNCAyLjMgMiAyIDEuNS0uOSAxLjItMy41LS40LTMuNS0yLjEgMC0yLjgtMi44LS44LTMuMyAxLjYtLjQgMS42LS41IDAtLjYtMS4xLS4xLTEuNS0uNi0xLjItMS42LjctMS43IDMuMy0yLjEgMy41LS41LjEuNS4yIDEuNi4zIDIuMiAwIC43LjkgMS40IDEuOSAxLjYgMi4xLjQgMi4zLTIuMy4yLTMuMi0uOC0uMy0yLTEuNy0yLjUtMy4xLTEuMS0zLTMtMy4zLTMtLjUiLz48L3N2Zz4=)](https://langgraph-studio.vercel.app/templates/open?githubUrl=https://github.com/langchain-ai/data-enrichment)

Producing structured results (e.g., to populate a database or spreadsheet) from open-ended research (e.g., web research) is a common use case that LLM-powered agents are well-suited to handle. Here, we provide a general template for this kind of "data enrichment agent" agent using [LangGraph](https://github.com/langchain-ai/langgraph) in [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio). It contains an example graph exported from `src/enrichment_agent/graph.py` that implements a research assistant capable of automatically gathering information on various topics from the web and structuring the results into a user-defined JSON format.

Expand All @@ -11,8 +12,8 @@ Producing structured results (e.g., to populate a database or spreadsheet) from

The enrichment agent defined in `src/enrichment_agent/graph.py` performs the following steps:

1. Takes a research **topic** and requested **extraction_schema** as user input.
2. The `call_agent_model` graph node uses an LLM with bound tools (defined in `tools.py`) to perform web search (using [Tavily](https://tavily.com/)) or web scraping.
1. Takes a research **topic** and requested **extraction_schema** as input. The
2. Searches the web for relevant information
3. Reads and extracts key details from websites
4. Organizes the findings into the requested structured format
5. Validates the gathered information for completeness and accuracy
Expand All @@ -37,17 +38,17 @@ The primary [search tool](./src/enrichment_agent/tools.py) [^1] used is [Tavily]
Setup instruction auto-generated by `langgraph template lock`. DO NOT EDIT MANUALLY.
-->

<details>
<summary>Setup for `model_name`</summary>
The `llm` configuration defaults are shown below:
### Setup Model

The defaults values for `model_name` are shown below:

```yaml
model_name: anthropic/claude-3-5-sonnet-20240620
```
Follow the instructions below to get set up, or pick one of the additional options.
### Anthropic Chat Models
#### Anthropic
To use Anthropic's chat models:
Expand All @@ -57,17 +58,7 @@ To use Anthropic's chat models:
```
ANTHROPIC_API_KEY=your-api-key
```
### Fireworks Chat Models
To use Fireworks AI's chat models:
1. Sign up for a [Fireworks AI account](https://app.fireworks.ai/signup) and obtain an API key.
2. Add your Fireworks AI API key to your `.env` file:
```
FIREWORKS_API_KEY=your-api-key
```
#### OpenAI Chat Models
#### OpenAI
To use OpenAI's chat models:
Expand All @@ -77,7 +68,7 @@ To use OpenAI's chat models:
OPENAI_API_KEY=your-api-key
```
</details>
Expand Down Expand Up @@ -200,74 +191,6 @@ Configuration auto-generated by `langgraph template lock`. DO NOT EDIT MANUALLY.
"value": "anthropic/claude-instant-1.2",
"variables": "ANTHROPIC_API_KEY"
},
{
"value": "fireworks/gemma2-9b-it",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3-70b-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3-70b-instruct-hf",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3-8b-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3-8b-instruct-hf",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3p1-405b-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3p1-405b-instruct-long",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3p1-70b-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/llama-v3p1-8b-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/mixtral-8x22b-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/mixtral-8x7b-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/mixtral-8x7b-instruct-hf",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/mythomax-l2-13b",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/phi-3-vision-128k-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/phi-3p5-vision-instruct",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/starcoder-16b",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "fireworks/yi-large",
"variables": "FIREWORKS_API_KEY"
},
{
"value": "openai/gpt-3.5-turbo",
"variables": "OPENAI_API_KEY"
Expand Down
25 changes: 12 additions & 13 deletions src/enrichment_agent/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@
from enrichment_agent.tools import scrape_website, search
from enrichment_agent.utils import init_model


# Define the nodes
async def call_agent_model(
state: State, *, config: Optional[RunnableConfig] = None
) -> Dict[str, Any]:
"""
Call the primary Language Model (LLM) to decide on the next research action.
"""Call the primary Language Model (LLM) to decide on the next research action.
This asynchronous function performs the following steps:
1. Initializes configuration and sets up the 'Info' tool, which is the user-defined extraction schema.
Expand All @@ -47,7 +47,6 @@ async def call_agent_model(
- If the LLM calls the 'Info' tool, it's considered as submitting the final answer.
- If the LLM doesn't call any tool, a prompt to use a tool is appended to the messages.
"""

# Load configuration from the provided RunnableConfig
configuration = Configuration.from_runnable_config(config)

Expand Down Expand Up @@ -82,7 +81,7 @@ async def call_agent_model(
break
if info is not None:
# The agent is submitting their answer;
# ensure it isnt' erroneously attempting to simultaneously perform research
# ensure it isn't erroneously attempting to simultaneously perform research
response.tool_calls = [
next(tc for tc in response.tool_calls if tc["name"] == "Info")
]
Expand All @@ -98,6 +97,7 @@ async def call_agent_model(
"loop_step": 1,
}


class InfoIsSatisfactory(BaseModel):
"""Validate whether the current extracted info is satisfactory and complete."""

Expand All @@ -112,8 +112,7 @@ class InfoIsSatisfactory(BaseModel):
async def reflect(
state: State, *, config: Optional[RunnableConfig] = None
) -> Dict[str, Any]:
"""
Validate the quality of the data enrichment agent's output.
"""Validate the quality of the data enrichment agent's output.
This asynchronous function performs the following steps:
1. Prepares the initial prompt using the main prompt template.
Expand Down Expand Up @@ -195,8 +194,7 @@ async def reflect(
def route_after_agent(
state: State,
) -> Literal["reflect", "tools", "call_agent_model", "__end__"]:
"""
Schedule the next node after the agent's action.
"""Schedule the next node after the agent's action.
This function determines the next step in the research process based on the
last message in the state. It handles three main scenarios:
Expand All @@ -210,7 +208,7 @@ def route_after_agent(
the message history.
Returns:
Literal["reflect", "tools", "call_agent_model", "__end__"]:
Literal["reflect", "tools", "call_agent_model", "__end__"]:
- "reflect": If the agent has submitted info for review.
- "tools": If the agent has called a tool other than "Info".
- "call_agent_model": If an unexpected message type is encountered.
Expand All @@ -225,7 +223,7 @@ def route_after_agent(
"""
last_message = state.messages[-1]

# "If for some reason the last message is not an AIMessage (due to a bug or unexpected behavior elsewhere in the code),
# "If for some reason the last message is not an AIMessage (due to a bug or unexpected behavior elsewhere in the code),
# it ensures the system doesn't crash but instead tries to recover by calling the agent model again.
if not isinstance(last_message, AIMessage):
return "call_agent_model"
Expand All @@ -236,11 +234,11 @@ def route_after_agent(
else:
return "tools"


def route_after_checker(
state: State, config: RunnableConfig
) -> Literal["__end__", "call_agent_model"]:
"""
Schedule the next node after the checker's evaluation.
"""Schedule the next node after the checker's evaluation.
This function determines whether to continue the research process or end it
based on the checker's evaluation and the current state of the research.
Expand All @@ -252,7 +250,7 @@ def route_after_checker(
the maximum number of allowed loops.
Returns:
Literal["__end__", "call_agent_model"]:
Literal["__end__", "call_agent_model"]:
- "__end__": If the research process should terminate.
- "call_agent_model": If further research is needed.
Expand Down Expand Up @@ -282,6 +280,7 @@ def route_after_checker(
else:
return "__end__"


# Create the graph
workflow = StateGraph(
State, input=InputState, output=OutputState, config_schema=Configuration
Expand Down
6 changes: 2 additions & 4 deletions src/enrichment_agent/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@ async def search(
*,
config: Annotated[RunnableConfig, InjectedToolArg]
) -> Optional[list[dict[str, Any]]]:
"""
Perform a general web search using the Tavily search engine.
"""Perform a general web search using the Tavily search engine.
This asynchronous function executes the following steps:
1. Extracts configuration from the provided RunnableConfig.
Expand Down Expand Up @@ -61,8 +60,7 @@ async def scrape_website(
state: Annotated[State, InjectedState],
config: Annotated[RunnableConfig, InjectedToolArg],
) -> str:
"""
Scrape and summarize content from a given URL.
"""Scrape and summarize content from a given URL.
This asynchronous function performs the following steps:
1. Fetches the content of the specified URL.
Expand Down
Loading

0 comments on commit 65fb936

Please sign in to comment.