Skip to content

Commit

Permalink
Merge pull request #139 from Mac0q/batch_mode
Browse files Browse the repository at this point in the history
batch mode
  • Loading branch information
vyokky authored Dec 13, 2024
2 parents d3cda0a + a128f12 commit 0a92c4c
Show file tree
Hide file tree
Showing 9 changed files with 402 additions and 22 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,7 @@ scripts/*
!vectordb/docs/example/
!vectordb/demonstration/example.yaml

.vscode
.vscode

# Ignore the record files
tasks_status.json
67 changes: 67 additions & 0 deletions documents/docs/advanced_usage/batch_mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Batch Mode

Batch mode is a feature of UFO, the agent allows batch automation of tasks.

## Quick Start

### Step 1: Create a Plan file

Before starting the Batch mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:

| Field | Description | Type |
| ------ | -------------------------------------------------------------------------------------------- | ------- |
| task | The task description. | String |
| object | The application or file to interact with. | String |
| close | Determines whether to close the corresponding application or file after completing the task. | Boolean |

Below is an example of a plan file:

```json
{
"task": "Type in a text of 'Test For Fun' with heading 1 level",
"object": "draft.docx",
"close": False
}
```

!!! note
The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Batch mode.
The structure of your files should be as follows, where `tasks` is the directory for your tasks and `files` is where your object files are stored:

- Parent
- tasks
- files


### Step 2: Start the Batch Mode
To start the Batch mode, run the following command:

```bash
# assume you are in the cloned UFO folder
python ufo.py --task_name {task_name} --mode batch_normal --plan {plan_file}
```

!!! tip
Replace `{task_name}` with the name of the task and `{plan_file}` with the `Path_to_Parent/Plan_file`.



## Evaluation
You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file.

You can check the evaluation log in the `logs/{task_name}/evaluation.log` file.

# References
The batch mode employs a `PlanReader` to parse the plan file and create a `FromFileSession` to follow the plan.

## PlanReader
The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file.

:::module.sessions.plan_reader.PlanReader

<br>
## FollowerSession

The `FromFileSession` is also located in the `ufo/module/sessions/session.py` file.

:::module.sessions.session.FromFileSession
26 changes: 13 additions & 13 deletions documents/docs/agents/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

In UFO, there are four types of agents: `HostAgent`, `AppAgent`, `FollowerAgent`, and `EvaluationAgent`. Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process:

| Agent | Description |
| --- | --- |
| [`HostAgent`](../agents/host_agent.md) | Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. |
| [`AppAgent`](../agents/app_agent.md) | Executes actions on the selected application. |
| [`FollowerAgent`](../agents/follower_agent.md) | Follows the user's instructions to complete the task. |
| [`EvaluationAgent`](../agents/evaluation_agent.md) | Evaluates the completeness of a session or a round. |
| Agent | Description |
| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| [`HostAgent`](../agents/host_agent.md) | Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. |
| [`AppAgent`](../agents/app_agent.md) | Executes actions on the selected application. |
| [`FollowerAgent`](../agents/follower_agent.md) | Follows the user's instructions to complete the task. |
| [`EvaluationAgent`](../agents/evaluation_agent.md) | Evaluates the completeness of a session or a round. |

In the normal workflow, only the `HostAgent` and `AppAgent` are involved in the user interaction process. The `FollowerAgent` and `EvaluationAgent` are used for specific tasks.

Expand All @@ -21,13 +21,13 @@ Please see below the orchestration of the agents in UFO:

An agent in UFO is composed of the following main components to fulfill its role in the UFO system:

| Component | Description |
| --- | --- |
| [`State`](../agents/design/state.md) | Represents the current state of the agent and determines the next action and agent to handle the request. |
| [`Memory`](../agents/design/memory.md) | Stores information about the user request, application state, and other relevant data. |
| [`Blackboard`](../agents/design/blackboard.md) | Stores information shared between agents. |
| [`Prompter`](../agents/design/prompter.md) | Generates prompts for the language model based on the user request and application state. |
| [`Processor`](../agents/design/processor.md) | Processes the workflow of the agent, including handling user requests, executing actions, and memory management. |
| Component | Description |
| ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| [`State`](../agents/design/state.md) | Represents the current state of the agent and determines the next action and agent to handle the request. |
| [`Memory`](../agents/design/memory.md) | Stores information about the user request, application state, and other relevant data. |
| [`Blackboard`](../agents/design/blackboard.md) | Stores information shared between agents. |
| [`Prompter`](../agents/design/prompter.md) | Generates prompts for the language model based on the user request and application state. |
| [`Processor`](../agents/design/processor.md) | Processes the workflow of the agent, including handling user requests, executing actions, and memory management. |

## Reference

Expand Down
14 changes: 11 additions & 3 deletions ufo/agents/agent/host_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ def create_agent(agent_type: str, *args, **kwargs) -> BasicAgent:
return AppAgent(*args, **kwargs)
elif agent_type == "follower":
return FollowerAgent(*args, **kwargs)
elif agent_type == "batch_normal":
return AppAgent(*args, **kwargs)
else:
raise ValueError("Invalid agent type: {}".format(agent_type))

Expand Down Expand Up @@ -233,10 +235,16 @@ def create_app_agent(
:return: The app agent.
"""

if mode == "normal":
if mode == "normal" or "batch_normal":

agent_name = "AppAgent/{root}/{process}".format(
root=application_root_name, process=application_window_name
agent_name = (
"AppAgent/{root}/{process}".format(
root=application_root_name, process=application_window_name
)
if mode == "normal"
else "BatchAgent/{root}/{process}".format(
root=application_root_name, process=application_window_name
)
)

app_agent: AppAgent = self.create_subagent(
Expand Down
3 changes: 2 additions & 1 deletion ufo/agents/states/host_agent_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,14 +198,15 @@ def next_state(self, agent: "HostAgent") -> AppAgentState:
:param agent: The current agent.
:return: The state for the next step.
"""

# Transition to the app agent state.
# Lazy import to avoid circular dependency.

from ufo.agents.states.app_agent_state import ContinueAppAgentState

return ContinueAppAgentState()


def next_agent(self, agent: "HostAgent") -> AppAgent:
"""
Get the agent for the next step.
Expand Down
6 changes: 6 additions & 0 deletions ufo/config/config_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,9 @@ DEFAULT_PNG_COMPRESS_LEVEL: 9 # The compress level for the PNG image, 0-9, 0 is

# Save UI tree
SAVE_UI_TREE: False # Whether to save the UI tree


# Record the status of the tasks
TASK_STATUS: True # Whether to record the status of the tasks in batch execution mode.
# TASK_STATUS_FILE # The path for the task status file.

57 changes: 56 additions & 1 deletion ufo/module/sessions/plan_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# Licensed under the MIT License.

import json
import os
from typing import List, Optional

from ufo.config.config import Config
Expand All @@ -20,9 +21,19 @@ def __init__(self, plan_file: str):
:param plan_file: The path of the plan file.
"""

self.plan_file = plan_file
with open(plan_file, "r") as f:
self.plan = json.load(f)
self.remaining_steps = self.get_steps()
self.support_apps = ["word", "excel", "powerpoint"]

def get_close(self) -> bool:
"""
Check if the plan is closed.
:return: True if the plan need closed, False otherwise.
"""

return self.plan.get("close", False)

def get_task(self) -> str:
"""
Expand All @@ -46,7 +57,7 @@ def get_operation_object(self) -> str:
:return: The operation object.
"""

return self.plan.get("object", "")
return self.plan.get("object", None).lower()

def get_initial_request(self) -> str:
"""
Expand Down Expand Up @@ -76,6 +87,42 @@ def get_host_agent_request(self) -> str:

return request

def get_file_path(self):

file_path = os.path.dirname(os.path.abspath(self.plan_file)).replace(
"tasks", "files"
)
file = os.path.basename(
self.plan.get(
"object",
)
)

return os.path.join(file_path, file)

def get_support_apps(self) -> List[str]:
"""
Get the support apps in the plan.
:return: The support apps in the plan.
"""

return self.support_apps

def get_host_request(self) -> str:
"""
Get the request for the host agent.
:return: The request for the host agent.
"""

task = self.get_task()
object_name = self.get_operation_object()
if object_name in self.support_apps:
request = task
else:
request = f"Open the application of {task}. You must output the selected application with their control text and label even if it is already open."

return request

def next_step(self) -> Optional[str]:
"""
Get the next step in the plan.
Expand All @@ -95,3 +142,11 @@ def task_finished(self) -> bool:
"""

return not self.remaining_steps

def get_root_path(self) -> str:
"""
Get the root path of the plan.
:return: The root path of the plan.
"""

return os.path.dirname(os.path.abspath(self.plan_file))
Loading

0 comments on commit 0a92c4c

Please sign in to comment.