Skip to content

Commit

Permalink
Update README, add ntbk w/ API usage (#2)
Browse files Browse the repository at this point in the history
---------

Co-authored-by: William Fu-Hinthorn <[email protected]>
  • Loading branch information
rlancemartin and hinthornw authored Sep 18, 2024
1 parent 2ae63b6 commit 566c384
Show file tree
Hide file tree
Showing 10 changed files with 388 additions and 64 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,3 +161,4 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.DS_Store
uv.lock
67 changes: 55 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,21 @@
[![Integration Tests](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml/badge.svg)](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml)
[![Open in - LangGraph Studio](https://img.shields.io/badge/Open_in-LangGraph_Studio-00324d.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI4NS4zMzMiIGhlaWdodD0iODUuMzMzIiB2ZXJzaW9uPSIxLjAiIHZpZXdCb3g9IjAgMCA2NCA2NCI+PHBhdGggZD0iTTEzIDcuOGMtNi4zIDMuMS03LjEgNi4zLTYuOCAyNS43LjQgMjQuNi4zIDI0LjUgMjUuOSAyNC41QzU3LjUgNTggNTggNTcuNSA1OCAzMi4zIDU4IDcuMyA1Ni43IDYgMzIgNmMtMTIuOCAwLTE2LjEuMy0xOSAxLjhtMzcuNiAxNi42YzIuOCAyLjggMy40IDQuMiAzLjQgNy42cy0uNiA0LjgtMy40IDcuNkw0Ny4yIDQzSDE2LjhsLTMuNC0zLjRjLTQuOC00LjgtNC44LTEwLjQgMC0xNS4ybDMuNC0zLjRoMzAuNHoiLz48cGF0aCBkPSJNMTguOSAyNS42Yy0xLjEgMS4zLTEgMS43LjQgMi41LjkuNiAxLjcgMS44IDEuNyAyLjcgMCAxIC43IDIuOCAxLjYgNC4xIDEuNCAxLjkgMS40IDIuNS4zIDMuMi0xIC42LS42LjkgMS40LjkgMS41IDAgMi43LS41IDIuNy0xIDAtLjYgMS4xLS44IDIuNi0uNGwyLjYuNy0xLjgtMi45Yy01LjktOS4zLTkuNC0xMi4zLTExLjUtOS44TTM5IDI2YzAgMS4xLS45IDIuNS0yIDMuMi0yLjQgMS41LTIuNiAzLjQtLjUgNC4yLjguMyAyIDEuNyAyLjUgMy4xLjYgMS41IDEuNCAyLjMgMiAyIDEuNS0uOSAxLjItMy41LS40LTMuNS0yLjEgMC0yLjgtMi44LS44LTMuMyAxLjYtLjQgMS42LS41IDAtLjYtMS4xLS4xLTEuNS0uNi0xLjItMS42LjctMS43IDMuMy0yLjEgMy41LS41LjEuNS4yIDEuNi4zIDIuMiAwIC43LjkgMS40IDEuOSAxLjYgMi4xLjQgMi4zLTIuMy4yLTMuMi0uOC0uMy0yLTEuNy0yLjUtMy4xLTEuMS0zLTMtMy4zLTMtLjUiLz48L3N2Zz4=)](https://langgraph-studio.vercel.app/templates/open?githubUrl=https://github.com/langchain-ai/data-enrichment)

This is a starter project to help you get started with developing a data enrichment agent using [LangGraph](https://github.com/langchain-ai/langgraph) in [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio).
Producing structured results (e.g., to populate a database or spreadsheet) from open-ended research (e.g., web research) is a common use case that LLM-powered agents are well-suited to handle. Here, we provide a general template for this kind of "data enrichment agent" agent using [LangGraph](https://github.com/langchain-ai/langgraph) in [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio). It contains an example graph exported from `src/enrichment_agent/graph.py` that implements a research assistant capable of automatically gathering information on various topics from the web and structuring the results into a user-defined JSON format.

![Graph view in LangGraph studio UI](./static/studio.png)

It contains an example graph exported from `src/enrichment_agent/graph.py` that implements a research assistant capable of automatically gathering information on various topics from the web.
![Overview of agent](./static/overview.png)

## What it does

The enrichment agent:
The enrichment agent defined in `src/enrichment_agent/graph.py` performs the following steps:

1. Takes a research **topic** and requested **extraction_schema** as input. The
2. Searches the web for relevant information
3. Reads and extracts key details from websites
4. Organizes the findings into the requested structured format
5. Validates the gathered information for completeness and accuracy

By default, it's set up to gather information based on the user-provided schema passed through the `extraction_schema` key in the state.
![Graph view in LangGraph studio UI](./static/studio.png)

## Getting Started

Expand Down Expand Up @@ -78,19 +76,56 @@ OPENAI_API_KEY=your-api-key
End setup instructions
-->
3. Customize whatever you'd like in the code.
4. Open the folder LangGraph Studio!
3. Consider a research topic and desired extraction schema.
As an example, here is a research topic we can consider.
```
"Top 5 chip providers for LLM Training"
```
And here is a desired extraction schema.
```json
"extraction_schema": {
"type": "object",
"properties": {
"companies": {
"type": "string",
"description": "Names of top chip providers for LLM training"
},
"technologies": {
"type": "string",
"description": "Brief summary of key chip technologies used for LLM training"
},
"market_share": {
"type": "string",
"description": "Overview of market share distribution among top providers"
},
"future_outlook": {
"type": "string",
"description": "Brief summary of future prospects and developments in the field"
}
},
"required": ["companies", "technologies", "market_share", "future_outlook"]
}
```
4. Open the folder LangGraph Studio, and input `topic` and `extraction_schema`.

![Results In Studio](./static/studio_example.png)

## How to customize

1. **Customize research targets**: Provide a custom `extraction_schema` when calling the graph to gather different types of information.
2. **Select a different model**: We default to anthropic (sonnet-35). You can select a compatible chat model using `provider/model-name` via configuration. Example: `anthropic/claude-3-haiku-20240307`.
3. **Customize the prompt**: We provide a default prompt in [prompts.py](./src/enrichment_agent/prompts.py). You can easily update this via configuration in the studio.
1. **Customize research targets**: Provide a custom JSON `extraction_schema` when calling the graph to gather different types of information.
2. **Select a different model**: We default to anthropic (sonnet-35). You can select a compatible chat model using `provider/model-name` via configuration. Example: `openai/gpt-4o-mini`.
3. **Customize the prompt**: We provide a default prompt in [prompts.py](./src/enrichment_agent/prompts.py). You can easily update this via configuration.

For quick prototyping, these configurations can be set in the studio UI.

![Config In Studio](./static/config.png)

You can also quickly extend this template by:

- Adding new tools and API connections in [tools.py](./src/enrichment_agent/tools.py). These are just any python functions.
- Adding additional steps in [graph.py](./src/enrichment_agent/graph.py). Concerned about hallucinatio
- Adding additional steps in [graph.py](./src/enrichment_agent/graph.py).

## Development

Expand All @@ -104,6 +139,14 @@ LangGraph Studio also integrates with [LangSmith](https://smith.langchain.com/)

[^1]: https://python.langchain.com/docs/concepts/#tools

## LangGraph API

We can also interact with the graph using the LangGraph API.

See `ntbk/testing.ipynb` for an example of how to do this.

LangGraph Cloud (see [here](https://langchain-ai.github.io/langgraph/cloud/#overview)) make it possible to deploy the agent.

<!--
Configuration auto-generated by `langgraph template lock`. DO NOT EDIT MANUALLY.
{
Expand Down
227 changes: 227 additions & 0 deletions ntbk/testing.ipynb

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,10 @@ lint.select = [
"T201",
"UP",
]
include = ["*.py", "*.pyi", "*.ipynb"]
lint.ignore = ["UP006", "UP007", "UP035", "D417", "E501"]
[tool.ruff.lint.per-file-ignores]
"tests/*" = ["D", "UP"]
"ntbk/*" = ["D", "UP", "T201"]
[tool.ruff.lint.pydocstyle]
convention = "google"
95 changes: 70 additions & 25 deletions src/enrichment_agent/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,47 @@
from enrichment_agent.tools import scrape_website, search
from enrichment_agent.utils import init_model

# Define the nodes


# Define the nodes
async def call_agent_model(
state: State, *, config: Optional[RunnableConfig] = None
) -> Dict[str, Any]:
"""Call the primary LLM to decide whether and how to continue researching."""
"""Call the primary Language Model (LLM) to decide on the next research action.
This asynchronous function performs the following steps:
1. Initializes configuration and sets up the 'Info' tool, which is the user-defined extraction schema.
2. Prepares the prompt and message history for the LLM.
3. Initializes and configures the LLM with available tools.
4. Invokes the LLM and processes its response.
5. Handles the LLM's decision to either continue research or submit final info.
"""
# Load configuration from the provided RunnableConfig
configuration = Configuration.from_runnable_config(config)

# Define the 'Info' tool, which is the user-defined extraction schema
info_tool = {
"name": "Info",
"description": "Call this when you have gathered all the relevant info",
"parameters": state.extraction_schema,
}

# Format the prompt defined in prompts.py with the extraction schema and topic
p = configuration.prompt.format(
info=json.dumps(state.extraction_schema, indent=2), topic=state.topic
)

# Create the messages list with the formatted prompt and the previous messages
messages = [HumanMessage(content=p)] + state.messages
raw_model = init_model(config)

# Initialize the raw model with the provided configuration and bind the tools
raw_model = init_model(config)
model = raw_model.bind_tools([scrape_website, search, info_tool], tool_choice="any")
response = cast(AIMessage, await model.ainvoke(messages))

# Initialize info to None
info = None

# Check if the response has tool calls
if response.tool_calls:
for tool_call in response.tool_calls:
if tool_call["name"] == "Info":
Expand Down Expand Up @@ -80,7 +97,16 @@ class InfoIsSatisfactory(BaseModel):
async def reflect(
state: State, *, config: Optional[RunnableConfig] = None
) -> Dict[str, Any]:
"""Validate the quality of the data enrichment agent's calls."""
"""Validate the quality of the data enrichment agent's output.
This asynchronous function performs the following steps:
1. Prepares the initial prompt using the main prompt template.
2. Constructs a message history for the model.
3. Prepares a checker prompt to evaluate the presumed info.
4. Initializes and configures a language model with structured output.
5. Invokes the model to assess the quality of the gathered information.
6. Processes the model's response and determines if the info is satisfactory.
"""
p = prompts.MAIN_PROMPT.format(
info=json.dumps(state.extraction_schema, indent=2), topic=state.topic
)
Expand All @@ -104,28 +130,27 @@ async def reflect(
f" Got: {type(last_message)}"
)

if response.is_satisfactory:
try:
return {"info": presumed_info}
except Exception as e:
return {
"messages": [
ToolMessage(
tool_call_id=last_message.tool_calls[0]["id"],
content=f"Invalid response: {e}",
name="Info",
status="error",
)
]
}
if response.is_satisfactory and presumed_info:
return {
"info": presumed_info,
"messages": [
ToolMessage(
tool_call_id=last_message.tool_calls[0]["id"],
content="\n".join(response.reason),
name="Info",
additional_kwargs={"artifact": response.model_dump()},
status="success",
)
],
}
else:
return {
"messages": [
ToolMessage(
tool_call_id=last_message.tool_calls[0]["id"],
content=str(response),
content="Unsatisfactory response:\n" + "\n".join(response.reason),
name="Info",
additional_kwargs={"artifact": response.dict()},
additional_kwargs={"artifact": response.model_dump()},
status="error",
)
]
Expand All @@ -135,28 +160,48 @@ async def reflect(
def route_after_agent(
state: State,
) -> Literal["reflect", "tools", "call_agent_model", "__end__"]:
"""Schedule the next node after the agent."""
"""Schedule the next node after the agent's action.
This function determines the next step in the research process based on the
last message in the state. It handles three main scenarios:
1. Error recovery: If the last message is unexpectedly not an AIMessage.
2. Info submission: If the agent has called the "Info" tool to submit findings.
3. Continued research: If the agent has called any other tool.
"""
last_message = state.messages[-1]

# "If for some reason the last message is not an AIMessage (due to a bug or unexpected behavior elsewhere in the code),
# it ensures the system doesn't crash but instead tries to recover by calling the agent model again.
if not isinstance(last_message, AIMessage):
return "call_agent_model"
# If the "Into" tool was called, then the model provided its extraction output. Reflect on the result
if last_message.tool_calls and last_message.tool_calls[0]["name"] == "Info":
return "reflect"
# The last message is a tool call that is not "Info" (extraction output)
else:
return "tools"


def route_after_checker(
state: State, config: RunnableConfig
) -> Literal["__end__", "call_agent_model"]:
"""Schedule the next node after the checker."""
"""Schedule the next node after the checker's evaluation.
This function determines whether to continue the research process or end it
based on the checker's evaluation and the current state of the research.
"""
configurable = Configuration.from_runnable_config(config)
last_message = state.messages
last_message = state.messages[-1]

if state.loop_step < configurable.max_loops:
if not state.info:
return "call_agent_model"
if isinstance(last_message, ToolMessage) and last_message.status == "error":
if not isinstance(last_message, ToolMessage):
raise ValueError(
f"{route_after_checker.__name__} expected a tool messages. Received: {type(last_message)}."
)
if last_message.status == "error":
# Research deemed unsatisfactory
return "call_agent_model"
# It's great!
Expand Down
13 changes: 8 additions & 5 deletions src/enrichment_agent/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,10 @@
async def search(
query: str, *, config: Annotated[RunnableConfig, InjectedToolArg]
) -> Optional[list[dict[str, Any]]]:
"""Search for general results.
"""Query a search engine.
This function performs a search using the Tavily search engine, which is designed
to provide comprehensive, accurate, and trusted results. It's particularly useful
for answering questions about current events.
This function queries the web to fetch comprehensive, accurate, and trusted results. It's particularly useful
for answering questions about current events. Provide as much context in the query as needed to ensure high recall.
"""
configuration = Configuration.from_runnable_config(config)
wrapped = TavilySearchResults(max_results=configuration.max_search_results)
Expand Down Expand Up @@ -56,7 +55,11 @@ async def scrape_website(
state: Annotated[State, InjectedState],
config: Annotated[RunnableConfig, InjectedToolArg],
) -> str:
"""Scrape and summarize content from a given URL."""
"""Scrape and summarize content from a given URL.
Returns:
str: A summary of the scraped content, tailored to the extraction schema.
"""
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
content = await response.text()
Expand Down
Binary file added static/config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/studio_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 566c384

Please sign in to comment.