Skip to content

Commit

Permalink
Merge pull request #150 from microsoft/pre-release
Browse files Browse the repository at this point in the history
New Release for v1.2.0
  • Loading branch information
vyokky authored Dec 16, 2024
2 parents dd46a6a + b682821 commit 6c9fde9
Show file tree
Hide file tree
Showing 127 changed files with 7,259 additions and 1,208 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ __pycache__/
*.pyc
*.ipynb
/.VSCodeCounter
/analysis/*

# Ignore the config file
ufo/config/config.yaml
Expand All @@ -34,4 +35,7 @@ scripts/*
!vectordb/docs/example/
!vectordb/demonstration/example.yaml

.vscode
.vscode

# Ignore the record files
tasks_status.json
41 changes: 35 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,20 +28,24 @@
- <b>AppAgent 👾</b>, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.
- <b>Application Automator 🎮</b>, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details [here](https://microsoft.github.io/UFO/automator/overview/).

Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend the application UI and fulfill the user's request. For more details, please consult our [technical report](https://arxiv.org/abs/2402.07939) and [documentation](https://microsoft.github.io/UFO/).
Both agents leverage the multi-modal capabilities of GPT-4V(o) to comprehend the application UI and fulfill the user's request. For more details, please consult our [technical report](https://arxiv.org/abs/2402.07939) and [documentation](https://microsoft.github.io/UFO/).
<h1 align="center">
<img src="./assets/framework_v2.png"/>
</h1>


## 📢 News
- 📅 2024-12-13: We have a **New Release for v1.2.0!**! Checkout our new features and improvements:
1. **Large Action Model (LAM) Data Collection:** We have released the code and sample data for Large Action Model (LAM) data collection with UFO! Please checkout our [new paper](https://arxiv.org/abs/2412.10047), [code](dataflow/README.md) and [documentation](https://microsoft.github.io/UFO/dataflow/overview/) for more details.
2. **Bash Command Support:** HostAgent also support bash command now!
3. **Bug Fixes:** We have fixed some bugs, error handling, and improved the overall performance.
- 📅 2024-09-08: We have a **New Release for v1.1.0!**, to allows UFO to click on any region of the application and reduces its latency by up tp 1/3!
- 📅 2024-07-06: We have a **New Release for v1.0.0!**. You can check out our [documentation](https://microsoft.github.io/UFO/). We welcome your contributions and feedback!
- 📅 2024-06-28: We are thrilled to announce that our official introduction video is now available on [YouTube](https://www.youtube.com/watch?v=QT_OhygMVXU)!
- 📅 2024-06-25: **New Release for v0.2.1!** We are excited to announce the release of version 0.2.1! This update includes several new features and improvements:
<!-- - 📅 2024-06-25: **New Release for v0.2.1!** We are excited to announce the release of version 0.2.1! This update includes several new features and improvements:
1. **HostAgent Refactor:** We've refactored the HostAgent to enhance its efficiency in managing AppAgents within UFO.
2. **Evaluation Agent:** Introducing an evaluation agent that assesses task completion and provides real-time feedback.
3. **Google Gemini Support:** UFO now supports Google Gemini as the inference engine. Refer to our detailed guide in [documentation](https://microsoft.github.io/UFO/supported_models/gemini/).
3. **Google Gemini && Claude Support:** UFO now supports Google Gemini and Cluade as the inference engine. Refer to our detailed guide in [Gemini documentation](https://microsoft.github.io/UFO/supported_models/gemini/) or [Claude documentation](https://microsoft.github.io/UFO/supported_models/claude/).
4. **Customized User Agents:** Users can now create customized agents by simply answering a few questions.
- 📅 2024-05-21: We have reached 5K stars!✨
- 📅 2024-05-08: **New Release for v0.1.1!** We've made some significant updates! Previously known as AppAgent and ActAgent, we've rebranded them to HostAgent and AppAgent to better align with their functionalities. Explore the latest enhancements:
Expand All @@ -53,7 +57,8 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th
1. We now support creating your help documents for each Windows application to become an app expert. Check the [documentation](https://microsoft.github.io/UFO/creating_app_agent/help_document_provision/) for more details!
2. UFO now supports RAG from offline documents and online Bing search.
3. You can save the task completion trajectory into its memory for UFO's reference, improving its future success rate!
4. You can customize different GPT models for HostAgent and AppAgent. Text-only models (e.g., GPT-4) are now supported!
4. You can customize different GPT models for HostAgent and AppAgent. Text-only models (e.g., GPT-4) are now supported! -->
- 📅 ...
- 📅 2024-02-14: Our [technical report](https://arxiv.org/abs/2402.07939) is online!
- 📅 2024-02-10: UFO is released on GitHub🎈. Happy Chinese New year🐉!

Expand Down Expand Up @@ -239,6 +244,26 @@ You may use them to debug, replay, or analyze the agent output.
---


<!-- ## 🎬 Demo Examples
We present two demo videos that complete user request on Windows OS using UFO. For more case study, please consult our [technical report](https://arxiv.org/abs/2402.07939).
#### 1️⃣🗑️ Example 1: Deleting all notes on a PowerPoint presentation.
In this example, we will demonstrate how to efficiently use UFO to delete all notes on a PowerPoint presentation with just a few simple steps. Explore this functionality to enhance your productivity and work smarter, not harder!
https://github.com/microsoft/UFO/assets/11352048/cf60c643-04f7-4180-9a55-5fb240627834
#### 2️⃣📧 Example 2: Composing an email using text from multiple sources.
In this example, we will demonstrate how to utilize UFO to extract text from Word documents, describe an image, compose an email, and send it seamlessly. Enjoy the versatility and efficiency of cross-application experiences with UFO!
https://github.com/microsoft/UFO/assets/11352048/aa41ad47-fae7-4334-8e0b-ba71c4fc32e0 -->





## 📊 Evaluation
Expand Down Expand Up @@ -271,9 +296,13 @@ If you use UFO in your research, please cite our paper:



## 🎨 Related Project
You may also find [TaskWeaver](https://github.com/microsoft/TaskWeaver?tab=readme-ov-file) useful, a code-first LLM agent framework for seamlessly planning and executing data analytics tasks.
## 🎨 Related Projects
1. If you're interested in data analytics agent frameworks, check out [TaskWeaver](https://github.com/microsoft/TaskWeaver?tab=readme-ov-file), a code-first LLM agent framework designed for seamlessly planning and executing data analytics tasks.

2. For more information on GUI agents, refer to our survey paper: [Large Language Model-Brained GUI Agents: A Survey](https://arxiv.org/abs/2411.18279). You can also explore the survey through:
- [Paper](https://arxiv.org/abs/2411.18279)
- [GitHub Repository](https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey)
- [Searchable Website](https://vyokky.github.io/LLM-Brained-GUI-Agents-Survey/)

## ⚠️ Disclaimer
By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices in [DISCLAIMER.md](./DISCLAIMER.md)
Expand Down
16 changes: 0 additions & 16 deletions SUPPORT.md

This file was deleted.

Binary file added assets/dataflow/execution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/dataflow/instantiation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/dataflow/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/dataflow/result_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/gui_agent.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/webpage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions dataflow/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Ignore files
cache/
controls_cache/
controller/utils/
config/config.yaml
Loading

0 comments on commit 6c9fde9

Please sign in to comment.