Skip to content

Commit

Permalink
Merge with moatless-tree-search (#36)
Browse files Browse the repository at this point in the history
  • Loading branch information
aorwall authored Nov 17, 2024
1 parent a50e3ef commit b826a3c
Show file tree
Hide file tree
Showing 124 changed files with 167,821 additions and 17,956 deletions.
142 changes: 110 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,127 @@
# Moatless Tools
Moatless Tools is a hobby project where I experiment with some ideas I have about how LLMs can be used to edit code in large existing codebases. I believe that rather than relying on an agent to reason its way to a solution, it is crucial to build good tools to insert the right context into the prompt and handle the response.

_Right now I'm focusing on moatless-tree-search, an extended version of moatless-tools. The code in moatless-tools is now a simplified version of that code base_.

## SWE-Bench
I use the [SWE-bench benchmark](https://www.swebench.com/) as a way to verify my ideas and am currently sharing the sixth place on the SWE-Bench Lite Leaderboard.

### GPT-4o
Moatless Tools 0.0.1 has a solve rate of 24%, with each benchmark instance costing an average of $0.13 to solve with GPT-4o. Running the SWE Bench Lite dataset with 300 instances costs approx 40 dollars.
### Version 0.0.3: Claude 3.5 Sonnet v20241022

[Try it out in Google Colab](https://colab.research.google.com/drive/15RpSjdprf9lcaP0oqKsuYfZl1c3kVB_t?usp=sharing)

### Claude 3.5 Sonnet
### Version 0.0.2: Claude 3.5 Sonnet
With version 0.0.2 I get 26.7% solve rate with Claude 3.5 Sonnet, with a bit higher cost of $0.17 per instance.

[Try the Claude 3.5 evaluation set up on Google Colab](https://colab.research.google.com/drive/1pKecc3pumsrOGzTOOCEqjRKzeCWLWQpj?usp=sharing)

## Try it out
I have focused on testing my ideas, and the project is currently a bit messy. My plan is to organize it in the coming period. However, feel free to clone the repo and try running this notebook:

1. [Run Moatless Tools on any repository](notebooks/00_index_and_run.ipynb)


## How it works
The solution is based on an agentic loop that functions as a finite state machine, transitioning between states. Each state can have its own prompts and response handling.

The following states are used in the usual workflow and code flow.

### Search
The Search Loop uses function calling to find relevant code using the following parameters:

* `query` - A query using natural language to describe the desired code.
* `code_snippet` - A specific code snippet that should be exactly matched.
* `class_name` - A specific class name to include in the search.
* `function_name` - A specific function name to include in the search.
* `file_pattern` - A glob pattern to filter search results to specific file types or directories.
### Version 0.0.1: GPT-4o
Moatless Tools 0.0.1 has a solve rate of 24%, with each benchmark instance costing an average of $0.13 to solve with GPT-4o. Running the SWE Bench Lite dataset with 300 instances costs approx 40 dollars.

For semantic search, a vector index is used, which is based on the llama index. This is a classic RAG solution where all code in the repository is chunked into relevant parts, such as at the method level, embedded, and indexed in a vector store. For class and function name search, a simple index is used where all function and class names are indexed.
[Try it out in Google Colab](https://colab.research.google.com/drive/15RpSjdprf9lcaP0oqKsuYfZl1c3kVB_t?usp=sharing)

### Identify
Identifies the code relevant to the task. If not all relevant code is found, it transitions back to Search. Once all relevant code is found, it transitions to PlanToCode.

### PlanToCode
Breaks down the request for code changes into smaller changes to specific parts (code spans) of the codebase.
# Try it out
I have focused on testing my ideas, and the project is currently a bit messy. My plan is to organize it in the coming period. However, feel free to clone the repo and try running this notebook:

### ClarifyChange
If the proposed changes affect too large a portion of the code, the change needs to be clarified to affect a smaller number of code lines.
1. [Run Moatless Tools on any repository](notebooks/00_index_and_run.ipynb)

### EditCode
Code is edited in search/replace blocks inspired by the edit block concept in [Aider](https://aider.chat/docs/benchmarks.html). In this concept, the LLM specifies the code to be changed in a search block and the code it will be changed to in a replace block. However, since the code to be changed is already known to the Code Loop, the search section is pre-filled, and the LLM only needs to respond with the replace section. The idea is that this reduces the risk of changing the wrong code by having an agreement on what to change before making the change.
## Environment Setup

Before running the evaluation, you'll need:
1. At least one LLM provider API key (e.g., OpenAI, Anthropic, etc.)
2. A Voyage AI API key from [voyageai.com](https://voyageai.com) to use the pre-embedded vector stores for SWE-Bench instances.
3. (Optional) Access to a testbed environment - see [moatless-testbeds](https://github.com/aorwall/moatless-testbeds) for setup instructions

You can configure these settings by either:

1. Create a `.env` file in the project root (copy from `.env.example`):

```bash
cp .env.example .env
# Edit .env with your values
```

2. Or export the variables directly:

```bash
# Directory for storing vector index store files
export INDEX_STORE_DIR="/tmp/index_store"

# Directory for storing clonedrepositories
export REPO_DIR="/tmp/repos"

# Required: At least one LLM provider API key
export OPENAI_API_KEY="<your-key>"
export ANTHROPIC_API_KEY="<your-key>"
export HUGGINGFACE_API_KEY="<your-key>"
export DEEPSEEK_API_KEY="<your-key>"

# ...or Base URL for custom LLM API service (optional)
export CUSTOM_LLM_API_BASE="<your-base-url>"
export CUSTOM_LLM_API_KEY="<your-key>"

# Required: API Key for Voyage Embeddings
export VOYAGE_API_KEY="<your-key>"

# Optional: Configuration for testbed environment (https://github.com/aorwall/moatless-testbeds)
export TESTBED_API_KEY="<your-key>"
export TESTBED_BASE_URL="<your-base-url>"
```

## Example

Basic setup using the `AgenticLoop` to solve a SWE-Bench instance.

```python
from moatless.agent import ActionAgent
from moatless.agent.code_prompts import SIMPLE_CODE_PROMPT
from moatless.benchmark.swebench import create_repository
from moatless.benchmark.utils import get_moatless_instance
from moatless.completion import CompletionModel
from moatless.file_context import FileContext
from moatless.index import CodeIndex
from moatless.loop import AgenticLoop
from moatless.actions import FindClass, FindFunction, FindCodeSnippet, SemanticSearch, RequestMoreContext, RequestCodeChange, Finish, Reject

index_store_dir = "/tmp/index_store"
repo_base_dir = "/tmp/repos"
persist_path = "trajectory.json"

instance = get_moatless_instance("django__django-16379")

completion_model = CompletionModel(model="gpt-4o", temperature=0.0)

repository = create_repository(instance)

code_index = CodeIndex.from_index_name(
instance["instance_id"], index_store_dir=index_store_dir, file_repo=repository
)

actions = [
FindClass(code_index=code_index, repository=repository),
FindFunction(code_index=code_index, repository=repository),
FindCodeSnippet(code_index=code_index, repository=repository),
SemanticSearch(code_index=code_index, repository=repository),
RequestMoreContext(repository=repository),
RequestCodeChange(repository=repository, completion_model=completion_model),
Finish(),
Reject()
]

file_context = FileContext(repo=repository)
agent = ActionAgent(actions=actions, completion=completion_model, system_prompt=SIMPLE_CODE_PROMPT)

loop = AgenticLoop.create(
message=instance["problem_statement"],
agent=agent,
file_context=file_context,
repository=repository,
persist_path=persist_path,
max_iterations=50,
max_cost=2.0 # Optional: Set maximum cost in dollars
)

final_node = loop.run()
if final_node:
print(final_node.observation.message)
```
5 changes: 1 addition & 4 deletions moatless/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1 @@
from moatless.repository import FileRepository
from moatless.workspace import Workspace
from moatless.transition_rules import TransitionRules
from moatless.loop import AgenticLoop
# from moatless.loop import AgenticLoop, TransitionRules
9 changes: 9 additions & 0 deletions moatless/actions/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from moatless.actions.code_change import RequestCodeChange
from moatless.actions.find_class import FindClass
from moatless.actions.find_code_snippet import FindCodeSnippet
from moatless.actions.find_function import FindFunction
from moatless.actions.finish import Finish
from moatless.actions.reject import Reject
from moatless.actions.run_tests import RunTests
from moatless.actions.semantic_search import SemanticSearch
from moatless.actions.view_code import ViewCode
143 changes: 143 additions & 0 deletions moatless/actions/action.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
import importlib
import logging
import pkgutil
from abc import ABC
from typing import List, Type, Tuple, Any, Dict, Optional, ClassVar

from pydantic import BaseModel, ConfigDict

from moatless.actions.model import (
ActionArguments,
Observation,
FewShotExample,
)
from moatless.file_context import FileContext
from moatless.index import CodeIndex
from moatless.repository.repository import Repository

logger = logging.getLogger(__name__)

_actions: Dict[str, Type["Action"]] = {}


class Action(BaseModel, ABC):
args_schema: ClassVar[Type[ActionArguments]]

model_config = ConfigDict(arbitrary_types_allowed=True)

def __init__(self, **data):
super().__init__(**data)

def execute(self, args: ActionArguments, file_context: FileContext) -> Observation:
"""
Execute the action.
"""

message = self._execute(file_context=file_context)
return Observation.create(message)

def _execute(self, file_context: FileContext) -> str | None:
"""
Execute the action and return the updated FileContext.
"""
raise NotImplementedError("Subclasses must implement this method.")

@property
def name(self) -> str:
return self.__class__.__name__


@classmethod
def from_dict(
cls,
obj: dict,
repository: Repository = None,
runtime: Any = None,
code_index: CodeIndex = None,
) -> "Action":
obj = obj.copy()
obj.pop("args_schema", None)
action_class_path = obj.pop("action_class", None)

if action_class_path:
module_name, class_name = action_class_path.rsplit(".", 1)
module = importlib.import_module(module_name)
action_class = getattr(module, class_name)

if repository and hasattr(action_class, "_repository"):
obj["repository"] = repository

if code_index and hasattr(action_class, "_code_index"):
obj["code_index"] = code_index

if runtime and hasattr(action_class, "_runtime"):
obj["runtime"] = runtime

return action_class.model_validate(obj)

raise ValueError(f"Unknown action: {obj}")

@classmethod
def model_validate(cls, obj: Any) -> "Action":
return cls(**obj)

@classmethod
def get_action_by_args_class(
cls, args_class: Type[ActionArguments]
) -> Optional[Type["Action"]]:
"""
Get the Action subclass corresponding to the given ActionArguments subclass.
Args:
args_class: The ActionArguments subclass to look up.
Returns:
The Action subclass if found, None otherwise.
"""

def search_subclasses(current_class):
if (
hasattr(current_class, "args_schema")
and current_class.args_schema == args_class
):
return current_class
for subclass in current_class.__subclasses__():
result = search_subclasses(subclass)
if result:
return result
return None

return search_subclasses(cls)

@classmethod
def get_action_by_name(cls, action_name: str) -> Type["Action"]:
"""
Dynamically import and return the appropriate Action class for the given action name.
"""
if not _actions:
cls._load_actions()

action = _actions.get(action_name)
if action:
return action

raise ValueError(f"Unknown action: {action_name}")

@classmethod
def _load_actions(cls):
actions_package = importlib.import_module("moatless.actions")

for _, module_name, _ in pkgutil.iter_modules(actions_package.__path__):
full_module_name = f"moatless.actions.{module_name}"
module = importlib.import_module(full_module_name)
for name, obj in module.__dict__.items():
if isinstance(obj, type) and issubclass(obj, Action) and obj != Action:
_actions[name] = obj

@classmethod
def get_few_shot_examples(cls) -> List[FewShotExample]:
"""
Returns a list of few-shot examples specific to this action.
Override this method in subclasses to provide custom examples.
"""
return []
Loading

0 comments on commit b826a3c

Please sign in to comment.