Update README, add ntbk w/ API usage (#2)

--------- Co-authored-by: William Fu-Hinthorn <[email protected]>
langchain-ai · Sep 18, 2024 · 566c384 · 566c384
1 parent 2ae63b6
commit 566c384
Show file tree

Hide file tree

Showing 10 changed files with 388 additions and 64 deletions.
diff --git a/.gitignore b/.gitignore
@@ -161,3 +161,4 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
 .DS_Store
+uv.lock
diff --git a/README.md b/README.md
@@ -4,23 +4,21 @@
 [![Integration Tests](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml/badge.svg)](https://github.com/langchain-ai/data-enrichment/actions/workflows/integration-tests.yml)
 [![Open in - LangGraph Studio](https://img.shields.io/badge/Open_in-LangGraph_Studio-00324d.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI4NS4zMzMiIGhlaWdodD0iODUuMzMzIiB2ZXJzaW9uPSIxLjAiIHZpZXdCb3g9IjAgMCA2NCA2NCI+PHBhdGggZD0iTTEzIDcuOGMtNi4zIDMuMS03LjEgNi4zLTYuOCAyNS43LjQgMjQuNi4zIDI0LjUgMjUuOSAyNC41QzU3LjUgNTggNTggNTcuNSA1OCAzMi4zIDU4IDcuMyA1Ni43IDYgMzIgNmMtMTIuOCAwLTE2LjEuMy0xOSAxLjhtMzcuNiAxNi42YzIuOCAyLjggMy40IDQuMiAzLjQgNy42cy0uNiA0LjgtMy40IDcuNkw0Ny4yIDQzSDE2LjhsLTMuNC0zLjRjLTQuOC00LjgtNC44LTEwLjQgMC0xNS4ybDMuNC0zLjRoMzAuNHoiLz48cGF0aCBkPSJNMTguOSAyNS42Yy0xLjEgMS4zLTEgMS43LjQgMi41LjkuNiAxLjcgMS44IDEuNyAyLjcgMCAxIC43IDIuOCAxLjYgNC4xIDEuNCAxLjkgMS40IDIuNS4zIDMuMi0xIC42LS42LjkgMS40LjkgMS41IDAgMi43LS41IDIuNy0xIDAtLjYgMS4xLS44IDIuNi0uNGwyLjYuNy0xLjgtMi45Yy01LjktOS4zLTkuNC0xMi4zLTExLjUtOS44TTM5IDI2YzAgMS4xLS45IDIuNS0yIDMuMi0yLjQgMS41LTIuNiAzLjQtLjUgNC4yLjguMyAyIDEuNyAyLjUgMy4xLjYgMS41IDEuNCAyLjMgMiAyIDEuNS0uOSAxLjItMy41LS40LTMuNS0yLjEgMC0yLjgtMi44LS44LTMuMyAxLjYtLjQgMS42LS41IDAtLjYtMS4xLS4xLTEuNS0uNi0xLjItMS42LjctMS43IDMuMy0yLjEgMy41LS41LjEuNS4yIDEuNi4zIDIuMiAwIC43LjkgMS40IDEuOSAxLjYgMi4xLjQgMi4zLTIuMy4yLTMuMi0uOC0uMy0yLTEuNy0yLjUtMy4xLTEuMS0zLTMtMy4zLTMtLjUiLz48L3N2Zz4=)](https://langgraph-studio.vercel.app/templates/open?githubUrl=https://github.com/langchain-ai/data-enrichment)
 
-This is a starter project to help you get started with developing a data enrichment agent using [LangGraph](https://github.com/langchain-ai/langgraph) in [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio).
+Producing structured results (e.g., to populate a database or spreadsheet) from open-ended research (e.g., web research) is a common use case that LLM-powered agents are well-suited to handle. Here, we provide a general template for this kind of "data enrichment agent" agent using [LangGraph](https://github.com/langchain-ai/langgraph) in [LangGraph Studio](https://github.com/langchain-ai/langgraph-studio). It contains an example graph exported from `src/enrichment_agent/graph.py` that implements a research assistant capable of automatically gathering information on various topics from the web and structuring the results into a user-defined JSON format.
 
-![Graph view in LangGraph studio UI](./static/studio.png)
-
-It contains an example graph exported from `src/enrichment_agent/graph.py` that implements a research assistant capable of automatically gathering information on various topics from the web.
+![Overview of agent](./static/overview.png)
 
 ## What it does
 
-The enrichment agent:
+The enrichment agent defined in `src/enrichment_agent/graph.py` performs the following steps:
 
 1. Takes a research **topic** and requested **extraction_schema** as input. The 
 2. Searches the web for relevant information
 3. Reads and extracts key details from websites
 4. Organizes the findings into the requested structured format
 5. Validates the gathered information for completeness and accuracy
 
-By default, it's set up to gather information based on the user-provided schema passed through the `extraction_schema` key in the state.
+![Graph view in LangGraph studio UI](./static/studio.png)
 
 ## Getting Started
 
@@ -78,19 +76,56 @@ OPENAI_API_KEY=your-api-key
 End setup instructions
 -->
 
-3. Customize whatever you'd like in the code.
-4. Open the folder LangGraph Studio!
+3. Consider a research topic and desired extraction schema.
+
+As an example, here is a research topic we can consider.
+```
+"Top 5 chip providers for LLM Training"
+```
+
+And here is a desired extraction schema.
+```json
+"extraction_schema": {
+"type": "object",
+"properties": {
+"companies": {
+"type": "string",
+"description": "Names of top chip providers for LLM training"
+},
+"technologies": {
+"type": "string",
+"description": "Brief summary of key chip technologies used for LLM training"
+},
+"market_share": {
+"type": "string",
+"description": "Overview of market share distribution among top providers"
+},
+"future_outlook": {
+"type": "string",
+"description": "Brief summary of future prospects and developments in the field"
+}
+},
+"required": ["companies", "technologies", "market_share", "future_outlook"]
+}
+```
+4. Open the folder LangGraph Studio, and input `topic` and `extraction_schema`.
+
+![Results In Studio](./static/studio_example.png) 
 
 ## How to customize
 
-1. **Customize research targets**: Provide a custom `extraction_schema` when calling the graph to gather different types of information.
-2. **Select a different model**: We default to anthropic (sonnet-35). You can select a compatible chat model using `provider/model-name` via configuration. Example: `anthropic/claude-3-haiku-20240307`.
-3. **Customize the prompt**: We provide a default prompt in [prompts.py](./src/enrichment_agent/prompts.py). You can easily update this via configuration in the studio.
+1. **Customize research targets**: Provide a custom JSON `extraction_schema` when calling the graph to gather different types of information. 
+2. **Select a different model**: We default to anthropic (sonnet-35). You can select a compatible chat model using `provider/model-name` via configuration. Example: `openai/gpt-4o-mini`.
+3. **Customize the prompt**: We provide a default prompt in [prompts.py](./src/enrichment_agent/prompts.py). You can easily update this via configuration.
+
+For quick prototyping, these configurations can be set in the studio UI.
+
+![Config In Studio](./static/config.png) 
 
 You can also quickly extend this template by:
 
 - Adding new tools and API connections in [tools.py](./src/enrichment_agent/tools.py). These are just any python functions.
-- Adding additional steps in [graph.py](./src/enrichment_agent/graph.py). Concerned about hallucinatio
+- Adding additional steps in [graph.py](./src/enrichment_agent/graph.py).
 
 ## Development
 
@@ -104,6 +139,14 @@ LangGraph Studio also integrates with [LangSmith](https://smith.langchain.com/)
 
 [^1]: https://python.langchain.com/docs/concepts/#tools
 
+## LangGraph API 
+
+We can also interact with the graph using the LangGraph API. 
+
+See `ntbk/testing.ipynb` for an example of how to do this.
+
+LangGraph Cloud (see [here](https://langchain-ai.github.io/langgraph/cloud/#overview)) make it possible to deploy the agent. 
+
 <!--
 Configuration auto-generated by `langgraph template lock`. DO NOT EDIT MANUALLY.
 {

diff --git a/ntbk/testing.ipynb b/ntbk/testing.ipynb
diff --git a/pyproject.toml b/pyproject.toml
@@ -45,8 +45,10 @@ lint.select = [
     "T201",
     "UP",
 ]
+include = ["*.py", "*.pyi", "*.ipynb"]
 lint.ignore = ["UP006", "UP007", "UP035", "D417", "E501"]
 [tool.ruff.lint.per-file-ignores]
 "tests/*" = ["D", "UP"]
+"ntbk/*" = ["D", "UP", "T201"]
 [tool.ruff.lint.pydocstyle]
 convention = "google"
diff --git a/src/enrichment_agent/graph.py b/src/enrichment_agent/graph.py
@@ -18,30 +18,47 @@
 from enrichment_agent.tools import scrape_website, search
 from enrichment_agent.utils import init_model
 
-# Define the nodes
-
 
+# Define the nodes
 async def call_agent_model(
     state: State, *, config: Optional[RunnableConfig] = None
 ) -> Dict[str, Any]:
-    """Call the primary LLM to decide whether and how to continue researching."""
+    """Call the primary Language Model (LLM) to decide on the next research action.
+
+    This asynchronous function performs the following steps:
+    1. Initializes configuration and sets up the 'Info' tool, which is the user-defined extraction schema.
+    2. Prepares the prompt and message history for the LLM.
+    3. Initializes and configures the LLM with available tools.
+    4. Invokes the LLM and processes its response.
+    5. Handles the LLM's decision to either continue research or submit final info.
+    """
+    # Load configuration from the provided RunnableConfig
     configuration = Configuration.from_runnable_config(config)
+
+    # Define the 'Info' tool, which is the user-defined extraction schema
     info_tool = {
         "name": "Info",
         "description": "Call this when you have gathered all the relevant info",
         "parameters": state.extraction_schema,
     }
 
+    # Format the prompt defined in prompts.py with the extraction schema and topic
     p = configuration.prompt.format(
         info=json.dumps(state.extraction_schema, indent=2), topic=state.topic
     )
+
+    # Create the messages list with the formatted prompt and the previous messages
     messages = [HumanMessage(content=p)] + state.messages
-    raw_model = init_model(config)
 
+    # Initialize the raw model with the provided configuration and bind the tools
+    raw_model = init_model(config)
     model = raw_model.bind_tools([scrape_website, search, info_tool], tool_choice="any")
     response = cast(AIMessage, await model.ainvoke(messages))
 
+    # Initialize info to None
     info = None
+
+    # Check if the response has tool calls
     if response.tool_calls:
         for tool_call in response.tool_calls:
             if tool_call["name"] == "Info":
@@ -80,7 +97,16 @@ class InfoIsSatisfactory(BaseModel):
 async def reflect(
     state: State, *, config: Optional[RunnableConfig] = None
 ) -> Dict[str, Any]:
-    """Validate the quality of the data enrichment agent's calls."""
+    """Validate the quality of the data enrichment agent's output.
+
+    This asynchronous function performs the following steps:
+    1. Prepares the initial prompt using the main prompt template.
+    2. Constructs a message history for the model.
+    3. Prepares a checker prompt to evaluate the presumed info.
+    4. Initializes and configures a language model with structured output.
+    5. Invokes the model to assess the quality of the gathered information.
+    6. Processes the model's response and determines if the info is satisfactory.
+    """
     p = prompts.MAIN_PROMPT.format(
         info=json.dumps(state.extraction_schema, indent=2), topic=state.topic
     )
@@ -104,28 +130,27 @@ async def reflect(
             f" Got: {type(last_message)}"
         )
 
-    if response.is_satisfactory:
-        try:
-            return {"info": presumed_info}
-        except Exception as e:
-            return {
-                "messages": [
-                    ToolMessage(
-                        tool_call_id=last_message.tool_calls[0]["id"],
-                        content=f"Invalid response: {e}",
-                        name="Info",
-                        status="error",
-                    )
-                ]
-            }
+    if response.is_satisfactory and presumed_info:
+        return {
+            "info": presumed_info,
+            "messages": [
+                ToolMessage(
+                    tool_call_id=last_message.tool_calls[0]["id"],
+                    content="\n".join(response.reason),
+                    name="Info",
+                    additional_kwargs={"artifact": response.model_dump()},
+                    status="success",
+                )
+            ],
+        }
     else:
         return {
             "messages": [
                 ToolMessage(
                     tool_call_id=last_message.tool_calls[0]["id"],
-                    content=str(response),
+                    content="Unsatisfactory response:\n" + "\n".join(response.reason),
                     name="Info",
-                    additional_kwargs={"artifact": response.dict()},
+                    additional_kwargs={"artifact": response.model_dump()},
                     status="error",
                 )
             ]
@@ -135,28 +160,48 @@ async def reflect(
 def route_after_agent(
     state: State,
 ) -> Literal["reflect", "tools", "call_agent_model", "__end__"]:
-    """Schedule the next node after the agent."""
+    """Schedule the next node after the agent's action.
+
+    This function determines the next step in the research process based on the
+    last message in the state. It handles three main scenarios:
+
+    1. Error recovery: If the last message is unexpectedly not an AIMessage.
+    2. Info submission: If the agent has called the "Info" tool to submit findings.
+    3. Continued research: If the agent has called any other tool.
+    """
     last_message = state.messages[-1]
 
+    # "If for some reason the last message is not an AIMessage (due to a bug or unexpected behavior elsewhere in the code),
+    # it ensures the system doesn't crash but instead tries to recover by calling the agent model again.
     if not isinstance(last_message, AIMessage):
         return "call_agent_model"
+    # If the "Into" tool was called, then the model provided its extraction output. Reflect on the result
     if last_message.tool_calls and last_message.tool_calls[0]["name"] == "Info":
         return "reflect"
+    # The last message is a tool call that is not "Info" (extraction output)
     else:
         return "tools"
 
 
 def route_after_checker(
     state: State, config: RunnableConfig
 ) -> Literal["__end__", "call_agent_model"]:
-    """Schedule the next node after the checker."""
+    """Schedule the next node after the checker's evaluation.
+
+    This function determines whether to continue the research process or end it
+    based on the checker's evaluation and the current state of the research.
+    """
     configurable = Configuration.from_runnable_config(config)
-    last_message = state.messages
+    last_message = state.messages[-1]
 
     if state.loop_step < configurable.max_loops:
         if not state.info:
             return "call_agent_model"
-        if isinstance(last_message, ToolMessage) and last_message.status == "error":
+        if not isinstance(last_message, ToolMessage):
+            raise ValueError(
+                f"{route_after_checker.__name__} expected a tool messages. Received: {type(last_message)}."
+            )
+        if last_message.status == "error":
             # Research deemed unsatisfactory
             return "call_agent_model"
         # It's great!

diff --git a/src/enrichment_agent/tools.py b/src/enrichment_agent/tools.py
@@ -23,11 +23,10 @@
 async def search(
     query: str, *, config: Annotated[RunnableConfig, InjectedToolArg]
 ) -> Optional[list[dict[str, Any]]]:
-    """Search for general results.
+    """Query a search engine.
 
-    This function performs a search using the Tavily search engine, which is designed
-    to provide comprehensive, accurate, and trusted results. It's particularly useful
-    for answering questions about current events.
+    This function queries the web to fetch comprehensive, accurate, and trusted results. It's particularly useful
+    for answering questions about current events. Provide as much context in the query as needed to ensure high recall.
     """
     configuration = Configuration.from_runnable_config(config)
     wrapped = TavilySearchResults(max_results=configuration.max_search_results)
@@ -56,7 +55,11 @@ async def scrape_website(
     state: Annotated[State, InjectedState],
     config: Annotated[RunnableConfig, InjectedToolArg],
 ) -> str:
-    """Scrape and summarize content from a given URL."""
+    """Scrape and summarize content from a given URL.
+
+    Returns:
+        str: A summary of the scraped content, tailored to the extraction schema.
+    """
     async with aiohttp.ClientSession() as session:
         async with session.get(url) as response:
             content = await response.text()

diff --git a/static/config.png b/static/config.png
diff --git a/static/overview.png b/static/overview.png
diff --git a/static/studio_example.png b/static/studio_example.png
-Original file line number
+Diff line change
@@ Expand Up / @@ -161,3 +161,4 @@ cython_debug/ @@
     #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
     #.idea/
     .DS_Store
+    uv.lock