-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge with moatless-tree-search (#36)
- Loading branch information
Showing
124 changed files
with
167,821 additions
and
17,956 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,127 @@ | ||
# Moatless Tools | ||
Moatless Tools is a hobby project where I experiment with some ideas I have about how LLMs can be used to edit code in large existing codebases. I believe that rather than relying on an agent to reason its way to a solution, it is crucial to build good tools to insert the right context into the prompt and handle the response. | ||
|
||
_Right now I'm focusing on moatless-tree-search, an extended version of moatless-tools. The code in moatless-tools is now a simplified version of that code base_. | ||
|
||
## SWE-Bench | ||
I use the [SWE-bench benchmark](https://www.swebench.com/) as a way to verify my ideas and am currently sharing the sixth place on the SWE-Bench Lite Leaderboard. | ||
|
||
### GPT-4o | ||
Moatless Tools 0.0.1 has a solve rate of 24%, with each benchmark instance costing an average of $0.13 to solve with GPT-4o. Running the SWE Bench Lite dataset with 300 instances costs approx 40 dollars. | ||
### Version 0.0.3: Claude 3.5 Sonnet v20241022 | ||
|
||
[Try it out in Google Colab](https://colab.research.google.com/drive/15RpSjdprf9lcaP0oqKsuYfZl1c3kVB_t?usp=sharing) | ||
|
||
### Claude 3.5 Sonnet | ||
### Version 0.0.2: Claude 3.5 Sonnet | ||
With version 0.0.2 I get 26.7% solve rate with Claude 3.5 Sonnet, with a bit higher cost of $0.17 per instance. | ||
|
||
[Try the Claude 3.5 evaluation set up on Google Colab](https://colab.research.google.com/drive/1pKecc3pumsrOGzTOOCEqjRKzeCWLWQpj?usp=sharing) | ||
|
||
## Try it out | ||
I have focused on testing my ideas, and the project is currently a bit messy. My plan is to organize it in the coming period. However, feel free to clone the repo and try running this notebook: | ||
|
||
1. [Run Moatless Tools on any repository](notebooks/00_index_and_run.ipynb) | ||
|
||
|
||
## How it works | ||
The solution is based on an agentic loop that functions as a finite state machine, transitioning between states. Each state can have its own prompts and response handling. | ||
|
||
The following states are used in the usual workflow and code flow. | ||
|
||
### Search | ||
The Search Loop uses function calling to find relevant code using the following parameters: | ||
|
||
* `query` - A query using natural language to describe the desired code. | ||
* `code_snippet` - A specific code snippet that should be exactly matched. | ||
* `class_name` - A specific class name to include in the search. | ||
* `function_name` - A specific function name to include in the search. | ||
* `file_pattern` - A glob pattern to filter search results to specific file types or directories. | ||
### Version 0.0.1: GPT-4o | ||
Moatless Tools 0.0.1 has a solve rate of 24%, with each benchmark instance costing an average of $0.13 to solve with GPT-4o. Running the SWE Bench Lite dataset with 300 instances costs approx 40 dollars. | ||
|
||
For semantic search, a vector index is used, which is based on the llama index. This is a classic RAG solution where all code in the repository is chunked into relevant parts, such as at the method level, embedded, and indexed in a vector store. For class and function name search, a simple index is used where all function and class names are indexed. | ||
[Try it out in Google Colab](https://colab.research.google.com/drive/15RpSjdprf9lcaP0oqKsuYfZl1c3kVB_t?usp=sharing) | ||
|
||
### Identify | ||
Identifies the code relevant to the task. If not all relevant code is found, it transitions back to Search. Once all relevant code is found, it transitions to PlanToCode. | ||
|
||
### PlanToCode | ||
Breaks down the request for code changes into smaller changes to specific parts (code spans) of the codebase. | ||
# Try it out | ||
I have focused on testing my ideas, and the project is currently a bit messy. My plan is to organize it in the coming period. However, feel free to clone the repo and try running this notebook: | ||
|
||
### ClarifyChange | ||
If the proposed changes affect too large a portion of the code, the change needs to be clarified to affect a smaller number of code lines. | ||
1. [Run Moatless Tools on any repository](notebooks/00_index_and_run.ipynb) | ||
|
||
### EditCode | ||
Code is edited in search/replace blocks inspired by the edit block concept in [Aider](https://aider.chat/docs/benchmarks.html). In this concept, the LLM specifies the code to be changed in a search block and the code it will be changed to in a replace block. However, since the code to be changed is already known to the Code Loop, the search section is pre-filled, and the LLM only needs to respond with the replace section. The idea is that this reduces the risk of changing the wrong code by having an agreement on what to change before making the change. | ||
## Environment Setup | ||
|
||
Before running the evaluation, you'll need: | ||
1. At least one LLM provider API key (e.g., OpenAI, Anthropic, etc.) | ||
2. A Voyage AI API key from [voyageai.com](https://voyageai.com) to use the pre-embedded vector stores for SWE-Bench instances. | ||
3. (Optional) Access to a testbed environment - see [moatless-testbeds](https://github.com/aorwall/moatless-testbeds) for setup instructions | ||
|
||
You can configure these settings by either: | ||
|
||
1. Create a `.env` file in the project root (copy from `.env.example`): | ||
|
||
```bash | ||
cp .env.example .env | ||
# Edit .env with your values | ||
``` | ||
|
||
2. Or export the variables directly: | ||
|
||
```bash | ||
# Directory for storing vector index store files | ||
export INDEX_STORE_DIR="/tmp/index_store" | ||
|
||
# Directory for storing clonedrepositories | ||
export REPO_DIR="/tmp/repos" | ||
|
||
# Required: At least one LLM provider API key | ||
export OPENAI_API_KEY="<your-key>" | ||
export ANTHROPIC_API_KEY="<your-key>" | ||
export HUGGINGFACE_API_KEY="<your-key>" | ||
export DEEPSEEK_API_KEY="<your-key>" | ||
|
||
# ...or Base URL for custom LLM API service (optional) | ||
export CUSTOM_LLM_API_BASE="<your-base-url>" | ||
export CUSTOM_LLM_API_KEY="<your-key>" | ||
|
||
# Required: API Key for Voyage Embeddings | ||
export VOYAGE_API_KEY="<your-key>" | ||
|
||
# Optional: Configuration for testbed environment (https://github.com/aorwall/moatless-testbeds) | ||
export TESTBED_API_KEY="<your-key>" | ||
export TESTBED_BASE_URL="<your-base-url>" | ||
``` | ||
|
||
## Example | ||
|
||
Basic setup using the `AgenticLoop` to solve a SWE-Bench instance. | ||
|
||
```python | ||
from moatless.agent import ActionAgent | ||
from moatless.agent.code_prompts import SIMPLE_CODE_PROMPT | ||
from moatless.benchmark.swebench import create_repository | ||
from moatless.benchmark.utils import get_moatless_instance | ||
from moatless.completion import CompletionModel | ||
from moatless.file_context import FileContext | ||
from moatless.index import CodeIndex | ||
from moatless.loop import AgenticLoop | ||
from moatless.actions import FindClass, FindFunction, FindCodeSnippet, SemanticSearch, RequestMoreContext, RequestCodeChange, Finish, Reject | ||
|
||
index_store_dir = "/tmp/index_store" | ||
repo_base_dir = "/tmp/repos" | ||
persist_path = "trajectory.json" | ||
|
||
instance = get_moatless_instance("django__django-16379") | ||
|
||
completion_model = CompletionModel(model="gpt-4o", temperature=0.0) | ||
|
||
repository = create_repository(instance) | ||
|
||
code_index = CodeIndex.from_index_name( | ||
instance["instance_id"], index_store_dir=index_store_dir, file_repo=repository | ||
) | ||
|
||
actions = [ | ||
FindClass(code_index=code_index, repository=repository), | ||
FindFunction(code_index=code_index, repository=repository), | ||
FindCodeSnippet(code_index=code_index, repository=repository), | ||
SemanticSearch(code_index=code_index, repository=repository), | ||
RequestMoreContext(repository=repository), | ||
RequestCodeChange(repository=repository, completion_model=completion_model), | ||
Finish(), | ||
Reject() | ||
] | ||
|
||
file_context = FileContext(repo=repository) | ||
agent = ActionAgent(actions=actions, completion=completion_model, system_prompt=SIMPLE_CODE_PROMPT) | ||
|
||
loop = AgenticLoop.create( | ||
message=instance["problem_statement"], | ||
agent=agent, | ||
file_context=file_context, | ||
repository=repository, | ||
persist_path=persist_path, | ||
max_iterations=50, | ||
max_cost=2.0 # Optional: Set maximum cost in dollars | ||
) | ||
|
||
final_node = loop.run() | ||
if final_node: | ||
print(final_node.observation.message) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1 @@ | ||
from moatless.repository import FileRepository | ||
from moatless.workspace import Workspace | ||
from moatless.transition_rules import TransitionRules | ||
from moatless.loop import AgenticLoop | ||
# from moatless.loop import AgenticLoop, TransitionRules |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
from moatless.actions.code_change import RequestCodeChange | ||
from moatless.actions.find_class import FindClass | ||
from moatless.actions.find_code_snippet import FindCodeSnippet | ||
from moatless.actions.find_function import FindFunction | ||
from moatless.actions.finish import Finish | ||
from moatless.actions.reject import Reject | ||
from moatless.actions.run_tests import RunTests | ||
from moatless.actions.semantic_search import SemanticSearch | ||
from moatless.actions.view_code import ViewCode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
import importlib | ||
import logging | ||
import pkgutil | ||
from abc import ABC | ||
from typing import List, Type, Tuple, Any, Dict, Optional, ClassVar | ||
|
||
from pydantic import BaseModel, ConfigDict | ||
|
||
from moatless.actions.model import ( | ||
ActionArguments, | ||
Observation, | ||
FewShotExample, | ||
) | ||
from moatless.file_context import FileContext | ||
from moatless.index import CodeIndex | ||
from moatless.repository.repository import Repository | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
_actions: Dict[str, Type["Action"]] = {} | ||
|
||
|
||
class Action(BaseModel, ABC): | ||
args_schema: ClassVar[Type[ActionArguments]] | ||
|
||
model_config = ConfigDict(arbitrary_types_allowed=True) | ||
|
||
def __init__(self, **data): | ||
super().__init__(**data) | ||
|
||
def execute(self, args: ActionArguments, file_context: FileContext) -> Observation: | ||
""" | ||
Execute the action. | ||
""" | ||
|
||
message = self._execute(file_context=file_context) | ||
return Observation.create(message) | ||
|
||
def _execute(self, file_context: FileContext) -> str | None: | ||
""" | ||
Execute the action and return the updated FileContext. | ||
""" | ||
raise NotImplementedError("Subclasses must implement this method.") | ||
|
||
@property | ||
def name(self) -> str: | ||
return self.__class__.__name__ | ||
|
||
|
||
@classmethod | ||
def from_dict( | ||
cls, | ||
obj: dict, | ||
repository: Repository = None, | ||
runtime: Any = None, | ||
code_index: CodeIndex = None, | ||
) -> "Action": | ||
obj = obj.copy() | ||
obj.pop("args_schema", None) | ||
action_class_path = obj.pop("action_class", None) | ||
|
||
if action_class_path: | ||
module_name, class_name = action_class_path.rsplit(".", 1) | ||
module = importlib.import_module(module_name) | ||
action_class = getattr(module, class_name) | ||
|
||
if repository and hasattr(action_class, "_repository"): | ||
obj["repository"] = repository | ||
|
||
if code_index and hasattr(action_class, "_code_index"): | ||
obj["code_index"] = code_index | ||
|
||
if runtime and hasattr(action_class, "_runtime"): | ||
obj["runtime"] = runtime | ||
|
||
return action_class.model_validate(obj) | ||
|
||
raise ValueError(f"Unknown action: {obj}") | ||
|
||
@classmethod | ||
def model_validate(cls, obj: Any) -> "Action": | ||
return cls(**obj) | ||
|
||
@classmethod | ||
def get_action_by_args_class( | ||
cls, args_class: Type[ActionArguments] | ||
) -> Optional[Type["Action"]]: | ||
""" | ||
Get the Action subclass corresponding to the given ActionArguments subclass. | ||
Args: | ||
args_class: The ActionArguments subclass to look up. | ||
Returns: | ||
The Action subclass if found, None otherwise. | ||
""" | ||
|
||
def search_subclasses(current_class): | ||
if ( | ||
hasattr(current_class, "args_schema") | ||
and current_class.args_schema == args_class | ||
): | ||
return current_class | ||
for subclass in current_class.__subclasses__(): | ||
result = search_subclasses(subclass) | ||
if result: | ||
return result | ||
return None | ||
|
||
return search_subclasses(cls) | ||
|
||
@classmethod | ||
def get_action_by_name(cls, action_name: str) -> Type["Action"]: | ||
""" | ||
Dynamically import and return the appropriate Action class for the given action name. | ||
""" | ||
if not _actions: | ||
cls._load_actions() | ||
|
||
action = _actions.get(action_name) | ||
if action: | ||
return action | ||
|
||
raise ValueError(f"Unknown action: {action_name}") | ||
|
||
@classmethod | ||
def _load_actions(cls): | ||
actions_package = importlib.import_module("moatless.actions") | ||
|
||
for _, module_name, _ in pkgutil.iter_modules(actions_package.__path__): | ||
full_module_name = f"moatless.actions.{module_name}" | ||
module = importlib.import_module(full_module_name) | ||
for name, obj in module.__dict__.items(): | ||
if isinstance(obj, type) and issubclass(obj, Action) and obj != Action: | ||
_actions[name] = obj | ||
|
||
@classmethod | ||
def get_few_shot_examples(cls) -> List[FewShotExample]: | ||
""" | ||
Returns a list of few-shot examples specific to this action. | ||
Override this method in subclasses to provide custom examples. | ||
""" | ||
return [] |
Oops, something went wrong.