Merge branch 'rlm/testing' of https://github.com/langchain-ai/data-en…

…richment into rlm/testing
langchain-ai · Sep 17, 2024 · 65fb936 · 65fb936
2 parents b0e80c4 + 8c62cf7
commit 65fb936
Show file tree

Hide file tree

Showing 6 changed files with 1,548 additions and 106 deletions.
diff --git a/.gitignore b/.gitignore
@@ -161,3 +161,4 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
 .DS_Store
+uv.lock
diff --git a/README.md b/README.md
@@ -2,6 +2,7 @@
 
 [![CI](https://github.com/langchain-ai/data-enrichment/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/langchain-ai/data-enrichment/actions/workflows/unit-tests.yml)
 [![Integration Tests](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml/badge.svg)](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml)
+[![Open in - LangGraph Studio](https://img.shields.io/badge/Open_in-LangGraph_Studio-00324d.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI4NS4zMzMiIGhlaWdodD0iODUuMzMzIiB2ZXJzaW9uPSIxLjAiIHZpZXdCb3g9IjAgMCA2NCA2NCI+PHBhdGggZD0iTTEzIDcuOGMtNi4zIDMuMS03LjEgNi4zLTYuOCAyNS43LjQgMjQuNi4zIDI0LjUgMjUuOSAyNC41QzU3LjUgNTggNTggNTcuNSA1OCAzMi4zIDU4IDcuMyA1Ni43IDYgMzIgNmMtMTIuOCAwLTE2LjEuMy0xOSAxLjhtMzcuNiAxNi42YzIuOCAyLjggMy40IDQuMiAzLjQgNy42cy0uNiA0LjgtMy40IDcuNkw0Ny4yIDQzSDE2LjhsLTMuNC0zLjRjLTQuOC00LjgtNC44LTEwLjQgMC0xNS4ybDMuNC0zLjRoMzAuNHoiLz48cGF0aCBkPSJNMTguOSAyNS42Yy0xLjEgMS4zLTEgMS43LjQgMi41LjkuNiAxLjcgMS44IDEuNyAyLjcgMCAxIC43IDIuOCAxLjYgNC4xIDEuNCAxLjkgMS40IDIuNS4zIDMuMi0xIC42LS42LjkgMS40LjkgMS41IDAgMi43LS41IDIuNy0xIDAtLjYgMS4xLS44IDIuNi0uNGwyLjYuNy0xLjgtMi45Yy01LjktOS4zLTkuNC0xMi4zLTExLjUtOS44TTM5IDI2YzAgMS4xLS45IDIuNS0yIDMuMi0yLjQgMS41LTIuNiAzLjQtLjUgNC4yLjguMyAyIDEuNyAyLjUgMy4xLjYgMS41IDEuNCAyLjMgMiAyIDEuNS0uOSAxLjItMy41LS40LTMuNS0yLjEgMC0yLjgtMi44LS44LTMuMyAxLjYtLjQgMS42LS41IDAtLjYtMS4xLS4xLTEuNS0uNi0xLjItMS42LjctMS43IDMuMy0yLjEgMy41LS41LjEuNS4yIDEuNi4zIDIuMiAwIC43LjkgMS40IDEuOSAxLjYgMi4xLjQgMi4zLTIuMy4yLTMuMi0uOC0uMy0yLTEuNy0yLjUtMy4xLTEuMS0zLTMtMy4zLTMtLjUiLz48L3N2Zz4=)](https://langgraph-studio.vercel.app/templates/open?githubUrl=https://github.com/langchain-ai/data-enrichment)
 
 Producing structured results (e.g., to populate a database or spreadsheet) from open-ended research (e.g., web research) is a common use case that LLM-powered agents are well-suited to handle. Here, we provide a general template for this kind of "data enrichment agent" agent using [LangGraph](https://github.com/langchain-ai/langgraph) in [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio). It contains an example graph exported from `src/enrichment_agent/graph.py` that implements a research assistant capable of automatically gathering information on various topics from the web and structuring the results into a user-defined JSON format.
 
@@ -11,8 +12,8 @@ Producing structured results (e.g., to populate a database or spreadsheet) from
 
 The enrichment agent defined in `src/enrichment_agent/graph.py` performs the following steps:
 
-1. Takes a research **topic** and requested **extraction_schema** as user input. 
-2. The `call_agent_model` graph node uses an LLM with bound tools (defined in `tools.py`) to perform web search (using [Tavily](https://tavily.com/)) or web scraping. 
+1. Takes a research **topic** and requested **extraction_schema** as input. The 
+2. Searches the web for relevant information
 3. Reads and extracts key details from websites
 4. Organizes the findings into the requested structured format
 5. Validates the gathered information for completeness and accuracy
@@ -37,17 +38,17 @@ The primary [search tool](./src/enrichment_agent/tools.py) [^1] used is [Tavily]
 Setup instruction auto-generated by `langgraph template lock`. DO NOT EDIT MANUALLY.
 -->
 
-<details>
-<summary>Setup for `model_name`</summary>
-The `llm` configuration defaults are shown below:
+### Setup Model
+
+The defaults values for `model_name` are shown below:
 
 ```yaml
 model_name: anthropic/claude-3-5-sonnet-20240620
 ```
 
 Follow the instructions below to get set up, or pick one of the additional options.
 
-### Anthropic Chat Models
+#### Anthropic
 
 To use Anthropic's chat models:
 
@@ -57,17 +58,7 @@ To use Anthropic's chat models:
 ```
 ANTHROPIC_API_KEY=your-api-key
 ```
-### Fireworks Chat Models
-
-To use Fireworks AI's chat models:
-
-1. Sign up for a [Fireworks AI account](https://app.fireworks.ai/signup) and obtain an API key.
-2. Add your Fireworks AI API key to your `.env` file:
-
-```
-FIREWORKS_API_KEY=your-api-key
-```
-#### OpenAI Chat Models
+#### OpenAI
 
 To use OpenAI's chat models:
 
@@ -77,7 +68,7 @@ To use OpenAI's chat models:
 OPENAI_API_KEY=your-api-key
 ```
 
-</details>
+
 
 
 
@@ -200,74 +191,6 @@ Configuration auto-generated by `langgraph template lock`. DO NOT EDIT MANUALLY.
               "value": "anthropic/claude-instant-1.2",
               "variables": "ANTHROPIC_API_KEY"
             },
-            {
-              "value": "fireworks/gemma2-9b-it",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3-70b-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3-70b-instruct-hf",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3-8b-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3-8b-instruct-hf",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3p1-405b-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3p1-405b-instruct-long",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3p1-70b-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/llama-v3p1-8b-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/mixtral-8x22b-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/mixtral-8x7b-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/mixtral-8x7b-instruct-hf",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/mythomax-l2-13b",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/phi-3-vision-128k-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/phi-3p5-vision-instruct",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/starcoder-16b",
-              "variables": "FIREWORKS_API_KEY"
-            },
-            {
-              "value": "fireworks/yi-large",
-              "variables": "FIREWORKS_API_KEY"
-            },
             {
               "value": "openai/gpt-3.5-turbo",
               "variables": "OPENAI_API_KEY"

diff --git a/src/enrichment_agent/graph.py b/src/enrichment_agent/graph.py
@@ -18,12 +18,12 @@
 from enrichment_agent.tools import scrape_website, search
 from enrichment_agent.utils import init_model
 
+
 # Define the nodes
 async def call_agent_model(
     state: State, *, config: Optional[RunnableConfig] = None
 ) -> Dict[str, Any]:
-    """
-    Call the primary Language Model (LLM) to decide on the next research action.
+    """Call the primary Language Model (LLM) to decide on the next research action.
 
     This asynchronous function performs the following steps:
     1. Initializes configuration and sets up the 'Info' tool, which is the user-defined extraction schema.
@@ -47,7 +47,6 @@ async def call_agent_model(
         - If the LLM calls the 'Info' tool, it's considered as submitting the final answer.
         - If the LLM doesn't call any tool, a prompt to use a tool is appended to the messages.
     """
-
     # Load configuration from the provided RunnableConfig
     configuration = Configuration.from_runnable_config(config)
 
@@ -82,7 +81,7 @@ async def call_agent_model(
                 break
     if info is not None:
         # The agent is submitting their answer;
-        # ensure it isnt' erroneously attempting to simultaneously perform research
+        # ensure it isn't erroneously attempting to simultaneously perform research
         response.tool_calls = [
             next(tc for tc in response.tool_calls if tc["name"] == "Info")
         ]
@@ -98,6 +97,7 @@ async def call_agent_model(
         "loop_step": 1,
     }
 
+
 class InfoIsSatisfactory(BaseModel):
     """Validate whether the current extracted info is satisfactory and complete."""
 
@@ -112,8 +112,7 @@ class InfoIsSatisfactory(BaseModel):
 async def reflect(
     state: State, *, config: Optional[RunnableConfig] = None
 ) -> Dict[str, Any]:
-    """
-    Validate the quality of the data enrichment agent's output.
+    """Validate the quality of the data enrichment agent's output.
 
     This asynchronous function performs the following steps:
     1. Prepares the initial prompt using the main prompt template.
@@ -195,8 +194,7 @@ async def reflect(
 def route_after_agent(
     state: State,
 ) -> Literal["reflect", "tools", "call_agent_model", "__end__"]:
-    """
-    Schedule the next node after the agent's action.
+    """Schedule the next node after the agent's action.
 
     This function determines the next step in the research process based on the
     last message in the state. It handles three main scenarios:
@@ -210,7 +208,7 @@ def route_after_agent(
                        the message history.
 
     Returns:
-        Literal["reflect", "tools", "call_agent_model", "__end__"]: 
+        Literal["reflect", "tools", "call_agent_model", "__end__"]:
             - "reflect": If the agent has submitted info for review.
             - "tools": If the agent has called a tool other than "Info".
             - "call_agent_model": If an unexpected message type is encountered.
@@ -225,7 +223,7 @@ def route_after_agent(
     """
     last_message = state.messages[-1]
 
-    # "If for some reason the last message is not an AIMessage (due to a bug or unexpected behavior elsewhere in the code), 
+    # "If for some reason the last message is not an AIMessage (due to a bug or unexpected behavior elsewhere in the code),
     # it ensures the system doesn't crash but instead tries to recover by calling the agent model again.
     if not isinstance(last_message, AIMessage):
         return "call_agent_model"
@@ -236,11 +234,11 @@ def route_after_agent(
     else:
         return "tools"
 
+
 def route_after_checker(
     state: State, config: RunnableConfig
 ) -> Literal["__end__", "call_agent_model"]:
-    """
-    Schedule the next node after the checker's evaluation.
+    """Schedule the next node after the checker's evaluation.
 
     This function determines whether to continue the research process or end it
     based on the checker's evaluation and the current state of the research.
@@ -252,7 +250,7 @@ def route_after_checker(
                                  the maximum number of allowed loops.
 
     Returns:
-        Literal["__end__", "call_agent_model"]: 
+        Literal["__end__", "call_agent_model"]:
             - "__end__": If the research process should terminate.
             - "call_agent_model": If further research is needed.
 
@@ -282,6 +280,7 @@ def route_after_checker(
     else:
         return "__end__"
 
+
 # Create the graph
 workflow = StateGraph(
     State, input=InputState, output=OutputState, config_schema=Configuration

diff --git a/src/enrichment_agent/tools.py b/src/enrichment_agent/tools.py
@@ -25,8 +25,7 @@ async def search(
     *, 
     config: Annotated[RunnableConfig, InjectedToolArg]
 ) -> Optional[list[dict[str, Any]]]:
-    """
-    Perform a general web search using the Tavily search engine.
+    """Perform a general web search using the Tavily search engine.
 
     This asynchronous function executes the following steps:
     1. Extracts configuration from the provided RunnableConfig.
@@ -61,8 +60,7 @@ async def scrape_website(
     state: Annotated[State, InjectedState],
     config: Annotated[RunnableConfig, InjectedToolArg],
 ) -> str:
-    """
-    Scrape and summarize content from a given URL.
+    """Scrape and summarize content from a given URL.
 
     This asynchronous function performs the following steps:
     1. Fetches the content of the specified URL.
-Original file line number
+Diff line change
@@ Expand Up / @@ -161,3 +161,4 @@ cython_debug/ @@
     #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
     #.idea/
     .DS_Store
+    uv.lock