diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..1388408e --- /dev/null +++ b/404.html @@ -0,0 +1,296 @@ + + + + + + + + UFO Documentation + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • +
  • +
  • +
+
+
+
+
+ + +

404

+ +

Page not found

+ + +
+
+ +
+
+ +
+ +
+ +
+ + + + + +
+ + + + + + + + + diff --git a/about/CODE_OF_CONDUCT/index.html b/about/CODE_OF_CONDUCT/index.html new file mode 100644 index 00000000..b99b5f22 --- /dev/null +++ b/about/CODE_OF_CONDUCT/index.html @@ -0,0 +1,318 @@ + + + + + + + + Code of Conduct - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Microsoft Open Source Code of Conduct

+

This project has adopted the Microsoft Open Source Code of Conduct.

+

Resources:

+ + +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/about/CONTRIBUTING/index.html b/about/CONTRIBUTING/index.html new file mode 100644 index 00000000..45b5742b --- /dev/null +++ b/about/CONTRIBUTING/index.html @@ -0,0 +1,325 @@ + + + + + + + + Contributing - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Contributing

+

This project welcomes contributions and suggestions. Most contributions require you to +agree to a Contributor License Agreement (CLA) declaring that you have the right to, +and actually do, grant us the rights to use your contribution. For details, visit +https://cla.microsoft.com.

+

When you submit a pull request, a CLA-bot will automatically determine whether you need +to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the +instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

+
+

Note

+

You should sunmit your pull request to the pre-release branch, not the main branch.

+
+

This project has adopted the Microsoft Open Source Code of Conduct. +For more information see the Code of Conduct FAQ +or contact opencode@microsoft.com with any additional questions or comments.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/about/DISCLAIMER/index.html b/about/DISCLAIMER/index.html new file mode 100644 index 00000000..3d6608be --- /dev/null +++ b/about/DISCLAIMER/index.html @@ -0,0 +1,349 @@ + + + + + + + + Disclaimer - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Disclaimer: Code Execution and Data Handling Notice

+

By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices:

+

1. Code Functionality:

+

The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference.

+

2. Data Privacy and Storage:

+

It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft.

+

3. User Responsibility:

+

By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process.

+

4. Security Measures:

+

Microsoft has implemented security measures to safeguard the action execution. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system.

+ +

You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code.

+

6. No Guarantee of Accuracy:

+

The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model.

+

7. Indemnification:

+

Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo.

+

8. Reporting Infringements:

+

If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary.

+

9. Modifications to the Disclaimer:

+

Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes.

+

By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/about/LICENSE/index.html b/about/LICENSE/index.html new file mode 100644 index 00000000..d1079d94 --- /dev/null +++ b/about/LICENSE/index.html @@ -0,0 +1,327 @@ + + + + + + + + License - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Copyright (c) Microsoft Corporation.

+

MIT License

+

Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions:

+

The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software.

+

THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/about/SUPPORT/index.html b/about/SUPPORT/index.html new file mode 100644 index 00000000..bb44076c --- /dev/null +++ b/about/SUPPORT/index.html @@ -0,0 +1,323 @@ + + + + + + + + Support - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Support

+

How to file issues and get help

+

This project uses GitHub Issues to track bugs and feature requests. Please search the existing +issues before filing new issues to avoid duplicates. For new issues, file your bug or +feature request as a new Issue.

+

You may use GitHub Issues to raise questions, bug reports, and feature requests.

+

For help and questions about using this project, please please contact ufo-agent@microsoft.com.

+

Microsoft Support Policy

+

Support for this PROJECT or PRODUCT is limited to the resources listed above.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/control_filtering/icon_filtering/index.html b/advanced_usage/control_filtering/icon_filtering/index.html new file mode 100644 index 00000000..47a46dcf --- /dev/null +++ b/advanced_usage/control_filtering/icon_filtering/index.html @@ -0,0 +1,588 @@ + + + + + + + + Icon Filtering - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Icon Filter

+

The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings.

+

Configuration

+

To activate the icon control filtering, you need to add ICON to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed icon control filter configuration in the config_dev.yaml file:

+
    +
  • CONTROL_FILTER: A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add ICON to the list.
  • +
  • CONTROL_FILTER_TOP_K_ICON: The number of controls to keep after filtering.
  • +
  • CONTROL_FILTER_MODEL_ICON_NAME: The control filter model name for icon similarity. By default, it is set to "clip-ViT-B-32".
  • +
+

Reference

+ + +
+ + + + +
+

+ Bases: BasicControlFilter

+ + +

A class that represents a icon model for control filtering.

+ + + + + + + + + +
+ + + + + + + + + +
+ + +

+ control_filter(control_dicts, cropped_icons_dict, plans, top_k) + +

+ + +
+ +

Filters control items based on their scores and returns the top-k items.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control_dicts + – +
    +

    The dictionary of all control items.

    +
    +
  • +
  • + cropped_icons_dict + – +
    +

    The dictionary of the cropped icons.

    +
    +
  • +
  • + plans + – +
    +

    The plans to compare the control icons against.

    +
    +
  • +
  • + top_k + – +
    +

    The number of top items to return.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The list of top-k control items based on their scores.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
def control_filter(self, control_dicts, cropped_icons_dict, plans, top_k):
+    """
+    Filters control items based on their scores and returns the top-k items.
+    :param control_dicts: The dictionary of all control items.
+    :param cropped_icons_dict: The dictionary of the cropped icons.
+    :param plans: The plans to compare the control icons against.
+    :param top_k: The number of top items to return.
+    :return: The list of top-k control items based on their scores.
+    """
+
+    scores_items = []
+    filtered_control_dict = {}
+
+    for label, cropped_icon in cropped_icons_dict.items():
+        score = self.control_filter_score(cropped_icon, plans)
+        scores_items.append((score, label))
+    topk_scores_items = heapq.nlargest(top_k, scores_items, key=lambda x: x[0])
+    topk_labels = [scores_items[1] for scores_items in topk_scores_items]
+
+    for label, control_item in control_dicts.items():
+        if label in topk_labels:
+            filtered_control_dict[label] = control_item
+    return filtered_control_dict
+
+
+
+ +
+ +
+ + +

+ control_filter_score(control_icon, plans) + +

+ + +
+ +

Calculates the score of a control icon based on its similarity to the given keywords.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control_icon + – +
    +

    The control icon image.

    +
    +
  • +
  • + plans + – +
    +

    The plan to compare the control icon against.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The maximum similarity score between the control icon and the keywords.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
def control_filter_score(self, control_icon, plans):
+    """
+    Calculates the score of a control icon based on its similarity to the given keywords.
+    :param control_icon: The control icon image.
+    :param plans: The plan to compare the control icon against.
+    :return: The maximum similarity score between the control icon and the keywords.
+    """
+
+    plans_embedding = self.get_embedding(plans)
+    control_icon_embedding = self.get_embedding(control_icon)
+    return max(self.cos_sim(control_icon_embedding, plans_embedding).tolist()[0])
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/control_filtering/overview/index.html b/advanced_usage/control_filtering/overview/index.html new file mode 100644 index 00000000..db0360bd --- /dev/null +++ b/advanced_usage/control_filtering/overview/index.html @@ -0,0 +1,1044 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Control Filtering

+

There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task.

+

Execept for configuring the control types for selection on CONTROL_LIST in config_dev.yaml, UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods:

+ + + + + + + + + + + + + + + + + + + + + +
Filtering MethodDescription
TextFilter the controls based on the control text.
SemanticFilter the controls based on the semantic similarity.
IconFilter the controls based on the control icon image.
+

Configuration

+

You can activate the control filtering by setting the CONTROL_FILTER in the config_dev.yaml file. The CONTROL_FILTER is a list of filtering methods that you want to apply to the controls, which can be TEXT, SEMANTIC, or ICON.

+

You can configure multiple filtering methods in the CONTROL_FILTER list.

+

Reference

+

The implementation of the control filtering is base on the BasicControlFilter class located in the ufo/automator/ui_control/control_filter.py file. Concrete filtering class inherit from the BasicControlFilter class and implement the control_filter method to filter the controls based on the specific filtering method.

+ + +
+ + + + +
+ + +

BasicControlFilter represents a model for filtering control items.

+ + + + + + + + + +
+ + + + + + + + + +
+ + +

+ __new__(model_path) + +

+ + +
+ +

Creates a new instance of BasicControlFilter.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + model_path + – +
    +

    The path to the model.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The BasicControlFilter instance.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
def __new__(cls, model_path):
+    """
+    Creates a new instance of BasicControlFilter.
+    :param model_path: The path to the model.
+    :return: The BasicControlFilter instance.
+    """
+    if model_path not in cls._instances:
+        instance = super(BasicControlFilter, cls).__new__(cls)
+        instance.model = cls.load_model(model_path)
+        cls._instances[model_path] = instance
+    return cls._instances[model_path]
+
+
+
+ +
+ +
+ + +

+ control_filter(control_dicts, plans, **kwargs) + + + abstractmethod + + +

+ + +
+ +

Calculates the cosine similarity between the embeddings of the given keywords and the control item.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control_dicts + – +
    +

    The control item to be compared with the plans.

    +
    +
  • +
  • + plans + – +
    +

    The plans to be used for calculating the similarity.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The filtered control items.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
104
+105
+106
+107
+108
+109
+110
+111
+112
@abstractmethod
+def control_filter(self, control_dicts, plans, **kwargs):
+    """
+    Calculates the cosine similarity between the embeddings of the given keywords and the control item.
+    :param control_dicts: The control item to be compared with the plans.
+    :param plans: The plans to be used for calculating the similarity.
+    :return: The filtered control items.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ cos_sim(embedding1, embedding2) + + + staticmethod + + +

+ + +
+ +

Computes the cosine similarity between two embeddings.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + embedding1 + – +
    +

    The first embedding.

    +
    +
  • +
  • + embedding2 + – +
    +

    The second embedding.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + float + – +
    +

    The cosine similarity between the two embeddings.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
@staticmethod
+def cos_sim(embedding1, embedding2) -> float:
+    """
+    Computes the cosine similarity between two embeddings.
+    :param embedding1: The first embedding.
+    :param embedding2: The second embedding.
+    :return: The cosine similarity between the two embeddings.
+    """
+    import sentence_transformers
+
+    return sentence_transformers.util.cos_sim(embedding1, embedding2)
+
+
+
+ +
+ +
+ + +

+ get_embedding(content) + +

+ + +
+ +

Encodes the given object into an embedding.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + content + – +
    +

    The content to encode.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The embedding of the object.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
def get_embedding(self, content):
+    """
+    Encodes the given object into an embedding.
+    :param content: The content to encode.
+    :return: The embedding of the object.
+    """
+
+    return self.model.encode(content)
+
+
+
+ +
+ +
+ + +

+ load_model(model_path) + + + staticmethod + + +

+ + +
+ +

Loads the model from the given model path.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + model_path + – +
    +

    The path to the model.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The loaded model.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
84
+85
+86
+87
+88
+89
+90
+91
+92
+93
@staticmethod
+def load_model(model_path):
+    """
+    Loads the model from the given model path.
+    :param model_path: The path to the model.
+    :return: The loaded model.
+    """
+    import sentence_transformers
+
+    return sentence_transformers.SentenceTransformer(model_path)
+
+
+
+ +
+ +
+ + +

+ plans_to_keywords(plans) + + + staticmethod + + +

+ + +
+ +

Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + plans + (List[str]) + – +
    +

    The plan to be parsed.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    A list of keywords extracted from the plan.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
@staticmethod
+def plans_to_keywords(plans: List[str]) -> List[str]:
+    """
+    Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters.
+    :param plans: The plan to be parsed.
+    :return: A list of keywords extracted from the plan.
+    """
+
+    keywords = []
+    for plan in plans:
+        words = plan.replace("'", "").strip(".").split()
+        words = [
+            word
+            for word in words
+            if word.isalpha() or bool(re.fullmatch(r"[\u4e00-\u9fa5]+", word))
+        ]
+        keywords.extend(words)
+    return keywords
+
+
+
+ +
+ +
+ + +

+ remove_stopwords(keywords) + + + staticmethod + + +

+ + +
+ +

Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords').

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + keywords + – +
    +

    The list of keywords to be filtered.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The list of keywords with the stopwords removed.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
@staticmethod
+def remove_stopwords(keywords):
+    """
+    Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords').
+    :param keywords: The list of keywords to be filtered.
+    :return: The list of keywords with the stopwords removed.
+    """
+
+    try:
+        from nltk.corpus import stopwords
+
+        stopwords_list = stopwords.words("english")
+    except LookupError as e:
+        import nltk
+
+        nltk.download("stopwords")
+        stopwords_list = nltk.corpus.stopwords.words("english")
+
+    return [keyword for keyword in keywords if keyword in stopwords_list]
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/control_filtering/semantic_filtering/index.html b/advanced_usage/control_filtering/semantic_filtering/index.html new file mode 100644 index 00000000..9c1eb52c --- /dev/null +++ b/advanced_usage/control_filtering/semantic_filtering/index.html @@ -0,0 +1,583 @@ + + + + + + + + Semantic Filtering - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Sematic Control Filter

+

The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings.

+

Configuration

+

To activate the semantic control filtering, you need to add SEMANTIC to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed sematic control filter configuration in the config_dev.yaml file:

+
    +
  • CONTROL_FILTER: A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add SEMANTIC to the list.
  • +
  • CONTROL_FILTER_TOP_K_SEMANTIC: The number of controls to keep after filtering.
  • +
  • CONTROL_FILTER_MODEL_SEMANTIC_NAME: The control filter model name for semantic similarity. By default, it is set to "all-MiniLM-L6-v2".
  • +
+

Reference

+ + +
+ + + + +
+

+ Bases: BasicControlFilter

+ + +

A class that represents a semantic model for control filtering.

+ + + + + + + + + +
+ + + + + + + + + +
+ + +

+ control_filter(control_dicts, plans, top_k) + +

+ + +
+ +

Filters control items based on their similarity to a set of keywords.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control_dicts + – +
    +

    The dictionary of control items to be filtered.

    +
    +
  • +
  • + plans + – +
    +

    The list of plans to be used for filtering.

    +
    +
  • +
  • + top_k + – +
    +

    The number of top control items to return.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The filtered control items.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
def control_filter(self, control_dicts, plans, top_k):
+    """
+    Filters control items based on their similarity to a set of keywords.
+    :param control_dicts: The dictionary of control items to be filtered.
+    :param plans: The list of plans to be used for filtering.
+    :param top_k: The number of top control items to return.
+    :return: The filtered control items.
+    """
+    scores_items = []
+    filtered_control_dict = {}
+
+    for label, control_item in control_dicts.items():
+        control_text = control_item.element_info.name.lower()
+        score = self.control_filter_score(control_text, plans)
+        scores_items.append((label, score))
+    topk_scores_items = heapq.nlargest(top_k, (scores_items), key=lambda x: x[1])
+    topk_items = [
+        (score_item[0], score_item[1]) for score_item in topk_scores_items
+    ]
+
+    for label, control_item in control_dicts.items():
+        if label in topk_items:
+            filtered_control_dict[label] = control_item
+    return filtered_control_dict
+
+
+
+ +
+ +
+ + +

+ control_filter_score(control_text, plans) + +

+ + +
+ +

Calculates the score for a control item based on the similarity between its text and a set of keywords.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control_text + – +
    +

    The text of the control item.

    +
    +
  • +
  • + plans + – +
    +

    The plan to be used for calculating the similarity.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The score (0-1) indicating the similarity between the control text and the keywords.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
def control_filter_score(self, control_text, plans):
+    """
+    Calculates the score for a control item based on the similarity between its text and a set of keywords.
+    :param control_text: The text of the control item.
+    :param plans: The plan to be used for calculating the similarity.
+    :return: The score (0-1) indicating the similarity between the control text and the keywords.
+    """
+
+    plan_embedding = self.get_embedding(plans)
+    control_text_embedding = self.get_embedding(control_text)
+    return max(self.cos_sim(control_text_embedding, plan_embedding).tolist()[0])
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/control_filtering/text_filtering/index.html b/advanced_usage/control_filtering/text_filtering/index.html new file mode 100644 index 00000000..41f20f9b --- /dev/null +++ b/advanced_usage/control_filtering/text_filtering/index.html @@ -0,0 +1,476 @@ + + + + + + + + Text Filtering - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Text Control Filter

+

The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan.

+

Configuration

+

To activate the text control filtering, you need to add TEXT to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed text control filter configuration in the config_dev.yaml file:

+
    +
  • CONTROL_FILTER: A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add TEXT to the list.
  • +
  • CONTROL_FILTER_TOP_K_PLAN: The number of agent's plan keywords or phrases to use for filtering the controls.
  • +
+

Reference

+ + +
+ + + + +
+ + +

A class that provides methods for filtering control items based on plans.

+ + + + + + + + + +
+ + + + + + + + + +
+ + +

+ control_filter(control_dicts, plans) + + + staticmethod + + +

+ + +
+ +

Filters control items based on keywords.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control_dicts + (Dict) + – +
    +

    The dictionary of control items to be filtered.

    +
    +
  • +
  • + plans + (List[str]) + – +
    +

    The list of plans to be used for filtering.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Dict + – +
    +

    The filtered control items.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/control_filter.py +
171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
@staticmethod
+def control_filter(control_dicts: Dict, plans: List[str]) -> Dict:
+    """
+    Filters control items based on keywords.
+    :param control_dicts: The dictionary of control items to be filtered.
+    :param plans: The list of plans to be used for filtering.
+    :return: The filtered control items.
+    """
+    filtered_control_dict = {}
+
+    keywords = BasicControlFilter.plans_to_keywords(plans)
+    for label, control_item in control_dicts.items():
+        control_text = control_item.element_info.name.lower()
+        if any(
+            keyword in control_text or control_text in keyword
+            for keyword in keywords
+        ):
+            filtered_control_dict[label] = control_item
+    return filtered_control_dict
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/customization/index.html b/advanced_usage/customization/index.html new file mode 100644 index 00000000..985faeea --- /dev/null +++ b/advanced_usage/customization/index.html @@ -0,0 +1,358 @@ + + + + + + + + Customization - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Customization

+

Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user.

+

Scenario

+

Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again.

+

Implementation

+

We currently implement the customization feature in the HostAgent class. When the HostAgent needs additional information, it will transit to the PENDING state and ask the user for the information. The user will provide the information, and the HostAgent will save it in the local memory base for future reference. The saved information is stored in the blackboard and can be accessed by all agents in the session.

+
+

Note

+

The customization memory base is only saved in a local file. These information will not upload to the cloud or any other storage to protect the user's privacy.

+
+

Configuration

+

You can configure the customization feature by setting the following field in the config_dev.yaml file.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
USE_CUSTOMIZATIONWhether to enable the customization.BooleanTrue
QA_PAIR_FILEThe path for the historical QA pairs.String"customization/historical_qa.txt"
QA_PAIR_NUMThe number of QA pairs for the customization.Integer20
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/follower_mode/index.html b/advanced_usage/follower_mode/index.html new file mode 100644 index 00000000..259efd34 --- /dev/null +++ b/advanced_usage/follower_mode/index.html @@ -0,0 +1,1234 @@ + + + + + + + + Follower Mode - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Follower Mode

+

The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an FollowerAgent that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification.

+

Quick Start

+

Step 1: Create a Plan file

+

Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescriptionType
taskThe task description.String
stepsThe list of steps for the agent to follow.List of Strings
objectThe application or file to interact with.String
+

Below is an example of a plan file:

+
{
+    "task": "Type in a text of 'Test For Fun' with heading 1 level",
+    "steps": 
+    [
+        "1.type in 'Test For Fun'", 
+        "2.Select the 'Test For Fun' text",
+        "3.Click 'Home' tab to show the 'Styles' ribbon tab",
+        "4.Click 'Styles' ribbon tab to show the style 'Heading 1'",
+        "5.Click 'Heading 1' style to apply the style to the selected text"
+    ],
+    "object": "draft.docx"
+}
+
+
+

Note

+

The object field is the application or file that the agent will interact with. The object must be active (can be minimized) when starting the Follower mode.

+
+

Step 2: Start the Follower Mode

+

To start the Follower mode, run the following command:

+
# assume you are in the cloned UFO folder
+python ufo.py --task_name {task_name} --mode follower --plan {plan_file}
+
+
+

Tip

+

Replace {task_name} with the name of the task and {plan_file} with the path to the plan file.

+
+

Step 3: Run in Batch (Optional)

+

You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command:

+
# assume you are in the cloned UFO folder
+python ufo.py --task_name {task_name} --mode follower --plan {plan_folder}
+
+

UFO will automatically detect the plan files in the folder and run them one by one.

+
+

Tip

+

Replace {task_name} with the name of the task and {plan_folder} with the path to the folder containing plan files.

+
+

Evaluation

+

You may want to evaluate the task is completed successfully or not by following the plan. UFO will call the EvaluationAgent to evaluate the task if EVA_SESSION is set to True in the config_dev.yaml file.

+

You can check the evaluation log in the logs/{task_name}/evaluation.log file.

+

References

+

The follower mode employs a PlanReader to parse the plan file and create a FollowerSession to follow the plan.

+

PlanReader

+

The PlanReader is located in the ufo/module/sessions/plan_reader.py file.

+ + +
+ + + + +
+ + +

The reader for a plan file.

+ +

Initialize a plan reader.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + plan_file + (str) + – +
    +

    The path of the plan file.

    +
    +
  • +
+
+ + + + + +
+ Source code in module/sessions/plan_reader.py +
17
+18
+19
+20
+21
+22
+23
+24
+25
def __init__(self, plan_file: str):
+    """
+    Initialize a plan reader.
+    :param plan_file: The path of the plan file.
+    """
+
+    with open(plan_file, "r") as f:
+        self.plan = json.load(f)
+    self.remaining_steps = self.get_steps()
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_host_agent_request() + +

+ + +
+ +

Get the request for the host agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The request for the host agent.

    +
    +
  • +
+
+
+ Source code in module/sessions/plan_reader.py +
64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
def get_host_agent_request(self) -> str:
+    """
+    Get the request for the host agent.
+    :return: The request for the host agent.
+    """
+
+    object_name = self.get_operation_object()
+
+    request = (
+        f"Open and select the application of {object_name}, and output the FINISH status immediately. "
+        "You must output the selected application with their control text and label even if it is already open."
+    )
+
+    return request
+
+
+
+ +
+ +
+ + +

+ get_initial_request() + +

+ + +
+ +

Get the initial request in the plan.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The initial request.

    +
    +
  • +
+
+
+ Source code in module/sessions/plan_reader.py +
51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
def get_initial_request(self) -> str:
+    """
+    Get the initial request in the plan.
+    :return: The initial request.
+    """
+
+    task = self.get_task()
+    object_name = self.get_operation_object()
+
+    request = f"{task} in {object_name}"
+
+    return request
+
+
+
+ +
+ +
+ + +

+ get_operation_object() + +

+ + +
+ +

Get the operation object in the step.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The operation object.

    +
    +
  • +
+
+
+ Source code in module/sessions/plan_reader.py +
43
+44
+45
+46
+47
+48
+49
def get_operation_object(self) -> str:
+    """
+    Get the operation object in the step.
+    :return: The operation object.
+    """
+
+    return self.plan.get("object", "")
+
+
+
+ +
+ +
+ + +

+ get_steps() + +

+ + +
+ +

Get the steps in the plan.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The steps in the plan.

    +
    +
  • +
+
+
+ Source code in module/sessions/plan_reader.py +
35
+36
+37
+38
+39
+40
+41
def get_steps(self) -> List[str]:
+    """
+    Get the steps in the plan.
+    :return: The steps in the plan.
+    """
+
+    return self.plan.get("steps", [])
+
+
+
+ +
+ +
+ + +

+ get_task() + +

+ + +
+ +

Get the task name.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The task name.

    +
    +
  • +
+
+
+ Source code in module/sessions/plan_reader.py +
27
+28
+29
+30
+31
+32
+33
def get_task(self) -> str:
+    """
+    Get the task name.
+    :return: The task name.
+    """
+
+    return self.plan.get("task", "")
+
+
+
+ +
+ +
+ + +

+ next_step() + +

+ + +
+ +

Get the next step in the plan.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Optional[str] + – +
    +

    The next step.

    +
    +
  • +
+
+
+ Source code in module/sessions/plan_reader.py +
79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
def next_step(self) -> Optional[str]:
+    """
+    Get the next step in the plan.
+    :return: The next step.
+    """
+
+    if self.remaining_steps:
+        step = self.remaining_steps.pop(0)
+        return step
+
+    return None
+
+
+
+ +
+ +
+ + +

+ task_finished() + +

+ + +
+ +

Check if the task is finished.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    True if the task is finished, False otherwise.

    +
    +
  • +
+
+
+ Source code in module/sessions/plan_reader.py +
91
+92
+93
+94
+95
+96
+97
def task_finished(self) -> bool:
+    """
+    Check if the task is finished.
+    :return: True if the task is finished, False otherwise.
+    """
+
+    return not self.remaining_steps
+
+
+
+ +
+ + + +
+ +
+ +


+

FollowerSession

+

The FollowerSession is also located in the ufo/module/sessions/session.py file.

+ + +
+ + + + +
+

+ Bases: BaseSession

+ + +

A session for following a list of plan for action taken. +This session is used for the follower agent, which accepts a plan file to follow using the PlanReader.

+ +

Initialize a session.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + task + (str) + – +
    +

    The name of current task.

    +
    +
  • +
  • + plan_file + (str) + – +
    +

    The path of the plan file to follow.

    +
    +
  • +
  • + should_evaluate + (bool) + – +
    +

    Whether to evaluate the session.

    +
    +
  • +
  • + id + (int) + – +
    +

    The id of the session.

    +
    +
  • +
+
+ + + + + +
+ Source code in module/sessions/session.py +
197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
+208
+209
+210
def __init__(
+    self, task: str, plan_file: str, should_evaluate: bool, id: int
+) -> None:
+    """
+    Initialize a session.
+    :param task: The name of current task.
+    :param plan_file: The path of the plan file to follow.
+    :param should_evaluate: Whether to evaluate the session.
+    :param id: The id of the session.
+    """
+
+    super().__init__(task, should_evaluate, id)
+
+    self.plan_reader = PlanReader(plan_file)
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ create_new_round() + +

+ + +
+ +

Create a new round.

+ +
+ Source code in module/sessions/session.py +
220
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+255
def create_new_round(self) -> None:
+    """
+    Create a new round.
+    """
+
+    # Get a request for the new round.
+    request = self.next_request()
+
+    # Create a new round and return None if the session is finished.
+    if self.is_finished():
+        return None
+
+    if self.total_rounds == 0:
+        utils.print_with_color("Complete the following request:", "yellow")
+        utils.print_with_color(self.plan_reader.get_initial_request(), "cyan")
+        agent = self._host_agent
+    else:
+        agent = self._host_agent.get_active_appagent()
+
+        # Clear the memory and set the state to continue the app agent.
+        agent.clear_memory()
+        agent.blackboard.requests.clear()
+
+        agent.set_state(ContinueAppAgentState())
+
+    round = BaseRound(
+        request=request,
+        agent=agent,
+        context=self.context,
+        should_evaluate=configs.get("EVA_ROUND", False),
+        id=self.total_rounds,
+    )
+
+    self.add_round(round.id, round)
+
+    return round
+
+
+
+ +
+ +
+ + +

+ next_request() + +

+ + +
+ +

Get the request for the new round.

+ +
+ Source code in module/sessions/session.py +
257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
def next_request(self) -> str:
+    """
+    Get the request for the new round.
+    """
+
+    # If the task is finished, return an empty string.
+    if self.plan_reader.task_finished():
+        self._finish = True
+        return ""
+
+    # Get the request from the plan reader.
+    if self.total_rounds == 0:
+        return self.plan_reader.get_host_agent_request()
+    else:
+        return self.plan_reader.next_step()
+
+
+
+ +
+ +
+ + +

+ request_to_evaluate() + +

+ + +
+ +

Check if the session should be evaluated.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    True if the session should be evaluated, False otherwise.

    +
    +
  • +
+
+
+ Source code in module/sessions/session.py +
273
+274
+275
+276
+277
+278
+279
def request_to_evaluate(self) -> bool:
+    """
+    Check if the session should be evaluated.
+    :return: True if the session should be evaluated, False otherwise.
+    """
+
+    return self.plan_reader.get_task()
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/reinforce_appagent/experience_learning/index.html b/advanced_usage/reinforce_appagent/experience_learning/index.html new file mode 100644 index 00000000..27d18ea2 --- /dev/null +++ b/advanced_usage/reinforce_appagent/experience_learning/index.html @@ -0,0 +1,1264 @@ + + + + + + + + Experience Learning - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Learning from Self-Experience

+

When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future.

+

Mechanism

+

Step 1: Complete a Session

+
    +
  • Event: UFO completes a session
  • +
+

Step 2: Ask User to Save Experience

+
    +
  • Action: The agent prompts the user with a choice to save the successful experience
  • +
+

+ Save Experience +

+ +

Step 3: User Chooses to Save

+
    +
  • Action: If the user chooses to save the experience
  • +
+

Step 4: Summarize and Save the Experience

+
    +
  • Tool: ExperienceSummarizer
  • +
  • Process:
  • +
  • Summarize the experience into a demonstration example
  • +
  • Save the demonstration example in the EXPERIENCE_SAVED_PATH as specified in the config_dev.yaml file
  • +
  • The demonstration example includes similar fields as those used in the AppAgent's prompt
  • +
+

Step 5: Retrieve and Utilize Saved Experience

+
    +
  • When: The AppAgent encounters a similar task in the future
  • +
  • Action: Retrieve the saved experience from the experience database
  • +
  • Outcome: Use the retrieved experience to generate a plan
  • +
+

Workflow Diagram

+
graph TD;
+    A[Complete Session] --> B[Ask User to Save Experience]
+    B --> C[User Chooses to Save]
+    C --> D[Summarize with ExperienceSummarizer]
+    D --> E[Save in EXPERIENCE_SAVED_PATH]
+    F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience]
+    G --> H[Generate Plan]
+
+

Activate the Learning from Self-Experience

+

Step 1: Configure the AppAgent

+

Configure the following parameters to allow UFO to use the RAG from its self-experience:

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_EXPERIENCEWhether to use the RAG from its self-experienceBooleanFalse
RAG_EXPERIENCE_RETRIEVED_TOPKThe topk for the offline retrieved documentsInteger5
+

Reference

+

Experience Summarizer

+

The ExperienceSummarizer class is located in the ufo/experience/experience_summarizer.py file. The ExperienceSummarizer class provides the following methods to summarize the experience:

+ + +
+ + + + +
+ + +

The ExperienceSummarizer class is the summarizer for the experience learning.

+ +

Initialize the ApplicationAgentPrompter.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    Whether the request is for visual model.

    +
    +
  • +
  • + prompt_template + (str) + – +
    +

    The path of the prompt template.

    +
    +
  • +
  • + example_prompt_template + (str) + – +
    +

    The path of the example prompt template.

    +
    +
  • +
  • + api_prompt_template + (str) + – +
    +

    The path of the api prompt template.

    +
    +
  • +
+
+ + + + + +
+ Source code in experience/summarizer.py +
22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
def __init__(
+    self,
+    is_visual: bool,
+    prompt_template: str,
+    example_prompt_template: str,
+    api_prompt_template: str,
+):
+    """
+    Initialize the ApplicationAgentPrompter.
+    :param is_visual: Whether the request is for visual model.
+    :param prompt_template: The path of the prompt template.
+    :param example_prompt_template: The path of the example prompt template.
+    :param api_prompt_template: The path of the api prompt template.
+    """
+    self.is_visual = is_visual
+    self.prompt_template = prompt_template
+    self.example_prompt_template = example_prompt_template
+    self.api_prompt_template = api_prompt_template
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ build_prompt(log_partition) + +

+ + +
+ +

Build the prompt.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + log_partition + (dict) + – +
    +

    The log partition. return: The prompt.

    +
    +
  • +
+
+
+ Source code in experience/summarizer.py +
41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
def build_prompt(self, log_partition: dict) -> list:
+    """
+    Build the prompt.
+    :param log_partition: The log partition.
+    return: The prompt.
+    """
+    experience_prompter = ExperiencePrompter(
+        self.is_visual,
+        self.prompt_template,
+        self.example_prompt_template,
+        self.api_prompt_template,
+    )
+    experience_system_prompt = experience_prompter.system_prompt_construction()
+    experience_user_prompt = experience_prompter.user_content_construction(
+        log_partition
+    )
+    experience_prompt = experience_prompter.prompt_construction(
+        experience_system_prompt, experience_user_prompt
+    )
+
+    return experience_prompt
+
+
+
+ +
+ +
+ + +

+ create_or_update_vector_db(summaries, db_path) + + + staticmethod + + +

+ + +
+ +

Create or update the vector database.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + summaries + (list) + – +
    +

    The summaries.

    +
    +
  • +
  • + db_path + (str) + – +
    +

    The path of the vector database.

    +
    +
  • +
+
+
+ Source code in experience/summarizer.py +
163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
@staticmethod
+def create_or_update_vector_db(summaries: list, db_path: str):
+    """
+    Create or update the vector database.
+    :param summaries: The summaries.
+    :param db_path: The path of the vector database.
+    """
+
+    document_list = []
+
+    for summary in summaries:
+        request = summary["request"]
+        document_list.append(Document(page_content=request, metadata=summary))
+
+    db = FAISS.from_documents(document_list, get_hugginface_embedding())
+
+    # Check if the db exists, if not, create a new one.
+    if os.path.exists(db_path):
+        prev_db = FAISS.load_local(db_path, get_hugginface_embedding())
+        db.merge_from(prev_db)
+
+    db.save_local(db_path)
+
+    print(f"Updated vector DB successfully: {db_path}")
+
+
+
+ +
+ +
+ + +

+ create_or_update_yaml(summaries, yaml_path) + + + staticmethod + + +

+ + +
+ +

Create or update the YAML file.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + summaries + (list) + – +
    +

    The summaries.

    +
    +
  • +
  • + yaml_path + (str) + – +
    +

    The path of the YAML file.

    +
    +
  • +
+
+
+ Source code in experience/summarizer.py +
127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
@staticmethod
+def create_or_update_yaml(summaries: list, yaml_path: str):
+    """
+    Create or update the YAML file.
+
+    :param summaries: The summaries.
+    :param yaml_path: The path of the YAML file.
+    """
+
+    # Check if the file exists, if not, create a new one
+    if not os.path.exists(yaml_path):
+        with open(yaml_path, "w"):
+            pass
+        print(f"Created new YAML file: {yaml_path}")
+
+    # Read existing data from the YAML file
+    with open(yaml_path, "r") as file:
+        existing_data = yaml.safe_load(file)
+
+    # Initialize index and existing_data if file is empty
+    index = len(existing_data) if existing_data else 0
+    existing_data = existing_data or {}
+
+    # Update data with new summaries
+    for i, summary in enumerate(summaries):
+        example = {f"example{index + i}": summary}
+        existing_data.update(example)
+
+    # Write updated data back to the YAML file
+    with open(yaml_path, "w") as file:
+        yaml.safe_dump(
+            existing_data, file, default_flow_style=False, sort_keys=False
+        )
+
+    print(f"Updated existing YAML file successfully: {yaml_path}")
+
+
+
+ +
+ +
+ + +

+ get_summary(prompt_message) + +

+ + +
+ +

Get the summary.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + prompt_message + (list) + – +
    +

    The prompt message. return: The summary and the cost.

    +
    +
  • +
+
+
+ Source code in experience/summarizer.py +
63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+92
+93
+94
+95
+96
+97
def get_summary(self, prompt_message: list) -> Tuple[dict, float]:
+    """
+    Get the summary.
+    :param prompt_message: The prompt message.
+    return: The summary and the cost.
+    """
+
+    # Get the completion for the prompt message
+    response_string, cost = get_completion(
+        prompt_message, "APPAGENT", use_backup_engine=True
+    )
+    try:
+        response_json = json_parser(response_string)
+    except:
+        response_json = None
+
+    # Restructure the response
+    if response_json:
+        summary = dict()
+        summary["example"] = {}
+        for key in [
+            "Observation",
+            "Thought",
+            "ControlLabel",
+            "ControlText",
+            "Function",
+            "Args",
+            "Status",
+            "Plan",
+            "Comment",
+        ]:
+            summary["example"][key] = response_json.get(key, "")
+        summary["Tips"] = response_json.get("Tips", "")
+
+    return summary, cost
+
+
+
+ +
+ +
+ + +

+ get_summary_list(logs) + +

+ + +
+ +

Get the summary list.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + logs + (list) + – +
    +

    The logs. return: The summary list and the total cost.

    +
    +
  • +
+
+
+ Source code in experience/summarizer.py +
 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
def get_summary_list(self, logs: list) -> Tuple[list, float]:
+    """
+    Get the summary list.
+    :param logs: The logs.
+    return: The summary list and the total cost.
+    """
+    summaries = []
+    total_cost = 0.0
+    for log_partition in logs:
+        prompt = self.build_prompt(log_partition)
+        summary, cost = self.get_summary(prompt)
+        summary["request"] = ExperienceLogLoader.get_user_request(log_partition)
+        summary["app_list"] = ExperienceLogLoader.get_app_list(log_partition)
+        summaries.append(summary)
+        total_cost += cost
+
+    return summaries, total_cost
+
+
+
+ +
+ +
+ + +

+ read_logs(log_path) + + + staticmethod + + +

+ + +
+ +

Read the log.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + log_path + (str) + – +
    +

    The path of the log file.

    +
    +
  • +
+
+
+ Source code in experience/summarizer.py +
117
+118
+119
+120
+121
+122
+123
+124
+125
@staticmethod
+def read_logs(log_path: str) -> list:
+    """
+    Read the log.
+    :param log_path: The path of the log file.
+    """
+    replay_loader = ExperienceLogLoader(log_path)
+    logs = replay_loader.create_logs()
+    return logs
+
+
+
+ +
+ + + +
+ +
+ +


+

Experience Retriever

+

The ExperienceRetriever class is located in the ufo/rag/retriever.py file. The ExperienceRetriever class provides the following methods to retrieve the experience:

+ + +
+ + + + +
+

+ Bases: Retriever

+ + +

Class to create experience retrievers.

+ +

Create a new ExperienceRetriever.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + db_path + – +
    +

    The path to the database.

    +
    +
  • +
+
+ + + + + +
+ Source code in rag/retriever.py +
131
+132
+133
+134
+135
+136
def __init__(self, db_path) -> None:
+    """
+    Create a new ExperienceRetriever.
+    :param db_path: The path to the database.
+    """
+    self.indexer = self.get_indexer(db_path)
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_indexer(db_path) + +

+ + +
+ +

Create an experience indexer.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + db_path + (str) + – +
    +

    The path to the database.

    +
    +
  • +
+
+
+ Source code in rag/retriever.py +
138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
def get_indexer(self, db_path: str):
+    """
+    Create an experience indexer.
+    :param db_path: The path to the database.
+    """
+
+    try:
+        db = FAISS.load_local(db_path, get_hugginface_embedding())
+        return db
+    except:
+        # print_with_color(
+        #     "Warning: Failed to load experience indexer from {path}.".format(
+        #         path=db_path
+        #     ),
+        #     "yellow",
+        # )
+        return None
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/reinforce_appagent/learning_from_bing_search/index.html b/advanced_usage/reinforce_appagent/learning_from_bing_search/index.html new file mode 100644 index 00000000..96eb8268 --- /dev/null +++ b/advanced_usage/reinforce_appagent/learning_from_bing_search/index.html @@ -0,0 +1,535 @@ + + + + + + + + Learning from Bing Search - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Learning from Bing Search

+

UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the AppAgent's knowledge.

+

Mechanism

+

Upon receiving a request, the AppAgent constructs a Bing search query based on the request and retrieves the search results from Bing. The AppAgent then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information.

+ +

Step 1: Obtain Bing API Key

+

To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the Microsoft Azure Bing Search API to get the API key.

+

Step 2: Configure the AppAgent

+

Configure the following parameters to allow UFO to use online Bing search for the decision-making process:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_ONLINE_SEARCHWhether to use the Bing searchBooleanFalse
BING_API_KEYThe Bing search API keyString""
RAG_ONLINE_SEARCH_TOPKThe topk for the online searchInteger5
RAG_ONLINE_RETRIEVED_TOPKThe topk for the online retrieved searched resultsInteger1
+

Reference

+ + +
+ + + + +
+

+ Bases: Retriever

+ + +

Class to create online retrievers.

+ +

Create a new OfflineDocRetriever. +:query: The query to create an indexer for. +:top_k: The number of documents to retrieve.

+ + + + + + +
+ Source code in rag/retriever.py +
162
+163
+164
+165
+166
+167
+168
+169
def __init__(self, query: str, top_k: int) -> None:
+    """
+    Create a new OfflineDocRetriever.
+    :query: The query to create an indexer for.
+    :top_k: The number of documents to retrieve.
+    """
+    self.query = query
+    self.indexer = self.get_indexer(top_k)
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_indexer(top_k) + +

+ + +
+ +

Create an online search indexer.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + top_k + (int) + – +
    +

    The number of documents to retrieve.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The created indexer.

    +
    +
  • +
+
+
+ Source code in rag/retriever.py +
171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
def get_indexer(self, top_k: int):
+    """
+    Create an online search indexer.
+    :param top_k: The number of documents to retrieve.
+    :return: The created indexer.
+    """
+
+    bing_retriever = web_search.BingSearchWeb()
+    result_list = bing_retriever.search(self.query, top_k=top_k)
+    documents = bing_retriever.create_documents(result_list)
+    if len(documents) == 0:
+        return None
+    indexer = bing_retriever.create_indexer(documents)
+    print_with_color(
+        "Online indexer created successfully for {num} searched results.".format(
+            num=len(documents)
+        ),
+        "cyan",
+    )
+    return indexer
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/reinforce_appagent/learning_from_demonstration/index.html b/advanced_usage/reinforce_appagent/learning_from_demonstration/index.html new file mode 100644 index 00000000..94635102 --- /dev/null +++ b/advanced_usage/reinforce_appagent/learning_from_demonstration/index.html @@ -0,0 +1,1165 @@ + + + + + + + + Learning from User Demonstration - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Here is the polished document for your Python code project:

+

Learning from User Demonstration

+

For complex tasks, users can demonstrate the task using Step Recorder to record the action trajectories. UFO can learn from these user demonstrations to improve the AppAgent's performance.

+

Mechanism

+

UFO use the Step Recorder tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The DemonstrationSummarizer class extracts and summarizes the demonstration. The summarized demonstration is saved in the DEMONSTRATION_SAVED_PATH as specified in the config_dev.yaml file. When the AppAgent encounters a similar task, the DemonstrationRetriever class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration.

+
+

Info

+

You can find how to record the task and action trajectories using the Step Recorder tool in the User Demonstration Provision document.

+
+

You can find a demo video of learning from user demonstrations:

+ + +


+

Activating Learning from User Demonstrations

+

Step 1: User Demonstration

+

Please follow the steps in the User Demonstration Provision document to provide user demonstrations.

+

Step 2: Configure the AppAgent

+

Configure the following parameters to allow UFO to use RAG from user demonstrations:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_DEMONSTRATIONWhether to use RAG from user demonstrationsBooleanFalse
RAG_DEMONSTRATION_RETRIEVED_TOPKThe top K documents to retrieve offlineInteger5
RAG_DEMONSTRATION_COMPLETION_NThe number of completion choices for the demonstration resultInteger3
+

Reference

+

Demonstration Summarizer

+

The DemonstrationSummarizer class is located in the record_processor/summarizer/summarizer.py file. The DemonstrationSummarizer class provides methods to summarize the demonstration:

+ + +
+ + + + +
+ + +

The DemonstrationSummarizer class is the summarizer for the demonstration learning. +It summarizes the demonstration record to a list of summaries, +and save the summaries to the YAML file and the vector database. +A sample of the summary is as follows: +{ + "example": { + "Observation": "Word.exe is opened.", + "Thought": "The user is trying to create a new file.", + "ControlLabel": "1", + "ControlText": "Sample Control Text", + "Function": "CreateFile", + "Args": "filename='new_file.txt'", + "Status": "Success", + "Plan": "Create a new file named 'new_file.txt'.", + "Comment": "The user successfully created a new file." + }, + "Tips": "You can use the 'CreateFile' function to create a new file." +}

+ +

Initialize the DemonstrationSummarizer.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    Whether the request is for visual model.

    +
    +
  • +
  • + prompt_template + (str) + – +
    +

    The path of the prompt template.

    +
    +
  • +
  • + demonstration_prompt_template + (str) + – +
    +

    The path of the example prompt template for demonstration.

    +
    +
  • +
  • + api_prompt_template + (str) + – +
    +

    The path of the api prompt template.

    +
    +
  • +
+
+ + + + + +
+ Source code in summarizer/summarizer.py +
39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
def __init__(
+    self,
+    is_visual: bool,
+    prompt_template: str,
+    demonstration_prompt_template: str,
+    api_prompt_template: str,
+    completion_num: int = 1,
+):
+    """
+    Initialize the DemonstrationSummarizer.
+    :param is_visual: Whether the request is for visual model.
+    :param prompt_template: The path of the prompt template.
+    :param demonstration_prompt_template: The path of the example prompt template for demonstration.
+    :param api_prompt_template: The path of the api prompt template.
+    """
+    self.is_visual = is_visual
+    self.prompt_template = prompt_template
+    self.demonstration_prompt_template = demonstration_prompt_template
+    self.api_prompt_template = api_prompt_template
+    self.completion_num = completion_num
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ __build_prompt(demo_record) + +

+ + +
+ +

Build the prompt by the user demonstration record.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + demo_record + (DemonstrationRecord) + – +
    +

    The user demonstration record. return: The prompt.

    +
    +
  • +
+
+
+ Source code in summarizer/summarizer.py +
 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
def __build_prompt(self, demo_record: DemonstrationRecord) -> list:
+    """
+    Build the prompt by the user demonstration record.
+    :param demo_record: The user demonstration record.
+    return: The prompt.
+    """
+    demonstration_prompter = DemonstrationPrompter(
+        self.is_visual,
+        self.prompt_template,
+        self.demonstration_prompt_template,
+        self.api_prompt_template,
+    )
+    demonstration_system_prompt = (
+        demonstration_prompter.system_prompt_construction()
+    )
+    demonstration_user_prompt = demonstration_prompter.user_content_construction(
+        demo_record
+    )
+    demonstration_prompt = demonstration_prompter.prompt_construction(
+        demonstration_system_prompt, demonstration_user_prompt
+    )
+
+    return demonstration_prompt
+
+
+
+ +
+ +
+ + +

+ __parse_response(response_string) + +

+ + +
+ +

Parse the response string to a dict of summary.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + response_string + (str) + – +
    +

    The response string. return: The summary dict.

    +
    +
  • +
+
+
+ Source code in summarizer/summarizer.py +
105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
def __parse_response(self, response_string: str) -> dict:
+    """
+    Parse the response string to a dict of summary.
+    :param response_string: The response string.
+    return: The summary dict.
+    """
+    try:
+        response_json = json_parser(response_string)
+    except:
+        response_json = None
+
+    # Restructure the response, in case any of the keys are missing, set them to empty string.
+    if response_json:
+        summary = dict()
+        summary["example"] = {}
+        for key in [
+            "Observation",
+            "Thought",
+            "ControlLabel",
+            "ControlText",
+            "Function",
+            "Args",
+            "Status",
+            "Plan",
+            "Comment",
+        ]:
+            summary["example"][key] = response_json.get(key, "")
+        summary["Tips"] = response_json.get("Tips", "")
+
+        return summary
+
+
+
+ +
+ +
+ + +

+ create_or_update_vector_db(summaries, db_path) + + + staticmethod + + +

+ + +
+ +

Create or update the vector database.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + summaries + (list) + – +
    +

    The summaries.

    +
    +
  • +
  • + db_path + (str) + – +
    +

    The path of the vector database.

    +
    +
  • +
+
+
+ Source code in summarizer/summarizer.py +
171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
@staticmethod
+def create_or_update_vector_db(summaries: list, db_path: str):
+    """
+    Create or update the vector database.
+    :param summaries: The summaries.
+    :param db_path: The path of the vector database.
+    """
+
+    document_list = []
+
+    for summary in summaries:
+        request = summary["request"]
+        document_list.append(Document(page_content=request, metadata=summary))
+
+    db = FAISS.from_documents(document_list, get_hugginface_embedding())
+
+    # Check if the db exists, if not, create a new one.
+    if os.path.exists(db_path):
+        prev_db = FAISS.load_local(db_path, get_hugginface_embedding())
+        db.merge_from(prev_db)
+
+    db.save_local(db_path)
+
+    print(f"Updated vector DB successfully: {db_path}")
+
+
+
+ +
+ +
+ + +

+ create_or_update_yaml(summaries, yaml_path) + + + staticmethod + + +

+ + +
+ +

Create or update the YAML file.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + summaries + (list) + – +
    +

    The summaries.

    +
    +
  • +
  • + yaml_path + (str) + – +
    +

    The path of the YAML file.

    +
    +
  • +
+
+
+ Source code in summarizer/summarizer.py +
136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
@staticmethod
+def create_or_update_yaml(summaries: list, yaml_path: str):
+    """
+    Create or update the YAML file.
+    :param summaries: The summaries.
+    :param yaml_path: The path of the YAML file.
+    """
+
+    # Check if the file exists, if not, create a new one
+    if not os.path.exists(yaml_path):
+        with open(yaml_path, "w"):
+            pass
+        print(f"Created new YAML file: {yaml_path}")
+
+    # Read existing data from the YAML file
+    with open(yaml_path, "r") as file:
+        existing_data = yaml.safe_load(file)
+
+    # Initialize index and existing_data if file is empty
+    index = len(existing_data) if existing_data else 0
+    existing_data = existing_data or {}
+
+    # Update data with new summaries
+    for i, summary in enumerate(summaries):
+        example = {f"example{index + i}": summary}
+        existing_data.update(example)
+
+    # Write updated data back to the YAML file
+    with open(yaml_path, "w") as file:
+        yaml.safe_dump(
+            existing_data, file, default_flow_style=False, sort_keys=False
+        )
+
+    print(f"Updated existing YAML file successfully: {yaml_path}")
+
+
+
+ +
+ +
+ + +

+ get_summary_list(record) + +

+ + +
+ +

Get the summary list for a record

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + record + (DemonstrationRecord) + – +
    +

    The demonstration record. return: The summary list for the user defined completion number and the cost

    +
    +
  • +
+
+
+ Source code in summarizer/summarizer.py +
60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
def get_summary_list(self, record: DemonstrationRecord) -> Tuple[list, float]:
+    """
+    Get the summary list for a record
+    :param record: The demonstration record.
+    return: The summary list for the user defined completion number and the cost
+    """
+
+    prompt = self.__build_prompt(record)
+    response_string_list, cost = get_completions(
+        prompt, "APPAGENT", use_backup_engine=True, n=self.completion_num
+    )
+    summaries = []
+    for response_string in response_string_list:
+        summary = self.__parse_response(response_string)
+        if summary:
+            summary["request"] = record.get_request()
+            summary["app_list"] = record.get_applications()
+            summaries.append(summary)
+
+    return summaries, cost
+
+
+
+ +
+ + + +
+ +
+ +


+

Demonstration Retriever

+

The DemonstrationRetriever class is located in the rag/retriever.py file. The DemonstrationRetriever class provides methods to retrieve the demonstration:

+ + +
+ + + + +
+

+ Bases: Retriever

+ + +

Class to create demonstration retrievers.

+ +

Create a new DemonstrationRetriever. +:db_path: The path to the database.

+ + + + + + +
+ Source code in rag/retriever.py +
198
+199
+200
+201
+202
+203
def __init__(self, db_path) -> None:
+    """
+    Create a new DemonstrationRetriever.
+    :db_path: The path to the database.
+    """
+    self.indexer = self.get_indexer(db_path)
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_indexer(db_path) + +

+ + +
+ +

Create a demonstration indexer. +:db_path: The path to the database.

+ +
+ Source code in rag/retriever.py +
205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
def get_indexer(self, db_path: str):
+    """
+    Create a demonstration indexer.
+    :db_path: The path to the database.
+    """
+
+    try:
+        db = FAISS.load_local(db_path, get_hugginface_embedding())
+        return db
+    except:
+        # print_with_color(
+        #     "Warning: Failed to load demonstration indexer from {path}.".format(
+        #         path=db_path
+        #     ),
+        #     "yellow",
+        # )
+        return None
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/reinforce_appagent/learning_from_help_document/index.html b/advanced_usage/reinforce_appagent/learning_from_help_document/index.html new file mode 100644 index 00000000..bfe73833 --- /dev/null +++ b/advanced_usage/reinforce_appagent/learning_from_help_document/index.html @@ -0,0 +1,601 @@ + + + + + + + + Learning from Help Document - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Learning from Help Documents

+

User or applications can provide help documents to the AppAgent to reinforce its capabilities. The AppAgent can retrieve knowledge from these documents to improve its understanding of the task, generate high-quality plans, and interact more efficiently with the application. You can find how to provide help documents to the AppAgent in the Help Document Provision section.

+

Mechanism

+

The help documents are provided in a format of task-solution pairs. Upon receiving a request, the AppAgent retrieves the relevant help documents by matching the request with the task descriptions in the help documents and generates a plan based on the retrieved solutions.

+
+

Note

+

Since the retrieved help documents may not be relevant to the request, the AppAgent will only take them as references to generate the plan.

+
+

Activate the Learning from Help Documents

+

Follow the steps below to activate the learning from help documents:

+

Step 1: Provide Help Documents

+

Please follow the steps in the Help Document Provision document to provide help documents to the AppAgent.

+

Step 2: Configure the AppAgent

+

Configure the following parameters in the config.yaml file to activate the learning from help documents:

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_OFFLINE_DOCSWhether to use the offline RAGBooleanFalse
RAG_OFFLINE_DOCS_RETRIEVED_TOPKThe topk for the offline retrieved documentsInteger1
+

Reference

+ + +
+ + + + +
+

+ Bases: Retriever

+ + +

Class to create offline retrievers.

+ +

Create a new OfflineDocRetriever. +:appname: The name of the application.

+ + + + + + +
+ Source code in rag/retriever.py +
78
+79
+80
+81
+82
+83
+84
+85
def __init__(self, app_name: str) -> None:
+    """
+    Create a new OfflineDocRetriever.
+    :appname: The name of the application.
+    """
+    self.app_name = app_name
+    indexer_path = self.get_offline_indexer_path()
+    self.indexer = self.get_indexer(indexer_path)
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_indexer(path) + +

+ + +
+ +

Load the retriever.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + path + (str) + – +
    +

    The path to load the retriever from.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The loaded retriever.

    +
    +
  • +
+
+
+ Source code in rag/retriever.py +
 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
def get_indexer(self, path: str):
+    """
+    Load the retriever.
+    :param path: The path to load the retriever from.
+    :return: The loaded retriever.
+    """
+
+    if path:
+        print_with_color(
+            "Loading offline indexer from {path}...".format(path=path), "cyan"
+        )
+    else:
+        return None
+
+    try:
+        db = FAISS.load_local(path, get_hugginface_embedding())
+        return db
+    except:
+        # print_with_color(
+        #     "Warning: Failed to load offline indexer from {path}.".format(
+        #         path=path
+        #     ),
+        #     "yellow",
+        # )
+        return None
+
+
+
+ +
+ +
+ + +

+ get_offline_indexer_path() + +

+ + +
+ +

Get the path to the offline indexer.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The path to the offline indexer.

    +
    +
  • +
+
+
+ Source code in rag/retriever.py +
87
+88
+89
+90
+91
+92
+93
+94
+95
+96
+97
def get_offline_indexer_path(self):
+    """
+    Get the path to the offline indexer.
+    :return: The path to the offline indexer.
+    """
+    offline_records = get_offline_learner_indexer_config()
+    for key in offline_records:
+        if key.lower() in self.app_name.lower():
+            return offline_records[key]
+
+    return None
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/advanced_usage/reinforce_appagent/overview/index.html b/advanced_usage/reinforce_appagent/overview/index.html new file mode 100644 index 00000000..4fe367bb --- /dev/null +++ b/advanced_usage/reinforce_appagent/overview/index.html @@ -0,0 +1,604 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Reinforcing AppAgent

+

UFO provides versatile mechanisms to reinforce the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other techniques. These enhance the AppAgent's understanding of the task, improving the quality of the generated plans, and increasing the efficiency of the AppAgent's interactions with the application.

+

We currently support the following reinforcement methods:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
Reinforcement MethodDescription
Learning from Help DocumentsReinforce the AppAgent by retrieving knowledge from help documents.
Learning from Bing SearchReinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge.
Learning from Self-ExperienceReinforce the AppAgent by learning from its own successful experiences.
Learning from User DemonstrationsReinforce the AppAgent by learning from action trajectories demonstrated by users.
+

Knowledge Provision

+

UFO provides the knowledge to the AppAgent through a context_provision method defined in the AppAgent class:

+
def context_provision(self, request: str = "") -> None:
+    """
+    Provision the context for the app agent.
+    :param request: The Bing search query.
+    """
+
+    # Load the offline document indexer for the app agent if available.
+    if configs["RAG_OFFLINE_DOCS"]:
+        utils.print_with_color(
+            "Loading offline help document indexer for {app}...".format(
+                app=self._process_name
+            ),
+            "magenta",
+        )
+        self.build_offline_docs_retriever()
+
+    # Load the online search indexer for the app agent if available.
+
+    if configs["RAG_ONLINE_SEARCH"] and request:
+        utils.print_with_color("Creating a Bing search indexer...", "magenta")
+        self.build_online_search_retriever(
+            request, configs["RAG_ONLINE_SEARCH_TOPK"]
+        )
+
+    # Load the experience indexer for the app agent if available.
+    if configs["RAG_EXPERIENCE"]:
+        utils.print_with_color("Creating an experience indexer...", "magenta")
+        experience_path = configs["EXPERIENCE_SAVED_PATH"]
+        db_path = os.path.join(experience_path, "experience_db")
+        self.build_experience_retriever(db_path)
+
+    # Load the demonstration indexer for the app agent if available.
+    if configs["RAG_DEMONSTRATION"]:
+        utils.print_with_color("Creating an demonstration indexer...", "magenta")
+        demonstration_path = configs["DEMONSTRATION_SAVED_PATH"]
+        db_path = os.path.join(demonstration_path, "demonstration_db")
+        self.build_human_demonstration_retriever(db_path)
+
+

The context_provision method loads the offline document indexer, online search indexer, experience indexer, and demonstration indexer for the AppAgent based on the configuration settings in the config_dev.yaml file.

+

Reference

+

UFO employs the Retriever class located in the ufo/rag/retriever.py file to retrieve knowledge from various sources. The Retriever class provides the following methods to retrieve knowledge:

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

Class to retrieve documents.

+ +

Create a new Retriever.

+ + + + + + +
+ Source code in rag/retriever.py +
42
+43
+44
+45
+46
+47
+48
+49
def __init__(self) -> None:
+    """
+    Create a new Retriever.
+    """
+
+    self.indexer = self.get_indexer()
+
+    pass
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_indexer() + + + abstractmethod + + +

+ + +
+ +

Get the indexer.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The indexer.

    +
    +
  • +
+
+
+ Source code in rag/retriever.py +
51
+52
+53
+54
+55
+56
+57
@abstractmethod
+def get_indexer(self):
+    """
+    Get the indexer.
+    :return: The indexer.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ retrieve(query, top_k, filter=None) + +

+ + +
+ +

Retrieve the document from the given query. +:filter: The filter to apply to the retrieved documents.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + query + (str) + – +
    +

    The query to retrieve the document from.

    +
    +
  • +
  • + top_k + (int) + – +
    +

    The number of documents to retrieve.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The document from the given query.

    +
    +
  • +
+
+
+ Source code in rag/retriever.py +
59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
def retrieve(self, query: str, top_k: int, filter=None):
+    """
+    Retrieve the document from the given query.
+    :param query: The query to retrieve the document from.
+    :param top_k: The number of documents to retrieve.
+    :filter: The filter to apply to the retrieved documents.
+    :return: The document from the given query.
+    """
+    if not self.indexer:
+        return None
+
+    return self.indexer.similarity_search(query, top_k, filter=filter)
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/app_agent/index.html b/agents/app_agent/index.html new file mode 100644 index 00000000..993c2869 --- /dev/null +++ b/agents/app_agent/index.html @@ -0,0 +1,2363 @@ + + + + + + + + AppAgent - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

AppAgent 👾

+

An AppAgent is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The AppAgent is created by the HostAgent to fulfill a sub-task within a Round. The AppAgent is responsible for executing the necessary actions within the application to fulfill the user's request. The AppAgent has the following features:

+
    +
  1. ReAct with the Application - The AppAgent recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request.
  2. +
  3. Comprehension Enhancement - The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application "expert".
  4. +
  5. Versatile Skill Set - The AppAgent is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and "Copilot".
  6. +
+
+

Tip

+

You can find the how to enhance the AppAgent with external knowledge bases and demonstration libraries in the Reinforcing AppAgent documentation.

+
+

We show the framework of the AppAgent in the following diagram:

+

+ AppAgent Image +

+ +

AppAgent Input

+

To interact with the application, the AppAgent receives the following inputs:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
InputDescriptionType
User RequestThe user's request in natural language.String
Sub-TaskThe sub-task description to be executed by the AppAgent, assigned by the HostAgent.String
Current ApplicationThe name of the application to be interacted with.String
Control InformationIndex, name and control type of available controls in the application.List of Dictionaries
Application ScreenshotsScreenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional).List of Strings
Previous Sub-TasksThe previous sub-tasks and their completion status.List of Strings
Previous PlanThe previous plan for the following steps.List of Strings
HostAgent MessageThe message from the HostAgent for the completion of the sub-task.String
Retrived InformationThe retrieved information from external knowledge bases or demonstration libraries.String
BlackboardThe shared memory space for storing and sharing information among the agents.Dictionary
+

Below is an example of the annotated application screenshot with labeled controls. This follow the Set-of-Mark paradigm.

+

+ AppAgent Image +

+ +

By processing these inputs, the AppAgent determines the necessary actions to fulfill the user's request within the application.

+
+

Tip

+

Whether to concatenate the clean screenshot and annotated screenshot can be configured in the CONCAT_SCREENSHOT field in the config_dev.yaml file.

+
+
+

Tip

+

Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the INCLUDE_LAST_SCREENSHOT field in the config_dev.yaml file.

+
+

AppAgent Output

+

With the inputs provided, the AppAgent generates the following outputs:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OutputDescriptionType
ObservationThe observation of the current application screenshots.String
ThoughtThe logical reasoning process of the AppAgent.String
ControlLabelThe index of the selected control to interact with.String
ControlTextThe name of the selected control to interact with.String
FunctionThe function to be executed on the selected control.String
ArgsThe arguments required for the function execution.List of Strings
StatusThe status of the agent, mapped to the AgentState.String
PlanThe plan for the following steps after the current action.List of Strings
CommentAdditional comments or information provided to the user.String
SaveScreenshotThe flag to save the screenshot of the application to the blackboard for future reference.Boolean
+

Below is an example of the AppAgent output:

+
{
+    "Observation": "Application screenshot",
+    "Thought": "Logical reasoning process",
+    "ControlLabel": "Control index",
+    "ControlText": "Control name",
+    "Function": "Function name",
+    "Args": ["arg1", "arg2"],
+    "Status": "AgentState",
+    "Plan": ["Step 1", "Step 2"],
+    "Comment": "Additional comments",
+    "SaveScreenshot": true
+}
+
+
+

Info

+

The AppAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.

+
+

AppAgent State

+

The AppAgent state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the ufo/agents/states/app_agent_states.py module. The states include:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StateDescription
CONTINUEThe AppAgent continues executing the current action.
FINISHThe AppAgent has completed the current sub-task.
ERRORThe AppAgent encountered an error during execution.
FAILThe AppAgent believes the current sub-task is unachievable.
CONFIRMThe AppAgent is confirming the user's input or action.
SCREENSHOTThe AppAgent believes the current screenshot is not clear in annotating the control and requests a new screenshot.
+

The state machine diagram for the AppAgent is shown below:

+

+ +

+ +

The AppAgent progresses through these states to execute the necessary actions within the application and fulfill the sub-task assigned by the HostAgent.

+

Knowledge Enhancement

+

The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The AppAgent leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance.

+

Learning from Help Documents

+

User can provide help documents to the AppAgent to enhance its comprehension of the application and improve its performance in the config.yaml file.

+
+

Tip

+

Please find details configuration in the documentation.

+
+
+

Tip

+

You may also refer to the here for how to provide help documents to the AppAgent.

+
+

In the AppAgent, it calls the build_offline_docs_retriever to build a help document retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent.

+ +

Since help documents may not cover all the information or the information may be outdated, the AppAgent can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the config.yaml file.

+
+

Tip

+

Please find details configuration in the documentation.

+
+
+

Tip

+

You may also refer to the here for the implementation of Bing search in the AppAgent.

+
+

In the AppAgent, it calls the build_online_search_retriever to build a Bing search retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent.

+

Learning from Self-Demonstrations

+

You may save successful action trajectories in the AppAgent to learn from self-demonstrations and improve its performance. After the completion of a session, the AppAgent will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the config.yaml file.

+
+

Tip

+

You can find details of the configuration in the documentation.

+
+
+

Tip

+

You may also refer to the here for the implementation of self-demonstrations in the AppAgent.

+
+

In the AppAgent, it calls the build_experience_retriever to build a self-demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent.

+

Learning from Human Demonstrations

+

In addition to self-demonstrations, you can also provide human demonstrations to the AppAgent to enhance its performance by using the Step Recorder tool built in the Windows OS. The AppAgent will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the config.yaml file.

+
+

Tip

+

You can find details of the configuration in the documentation.

+
+
+

Tip

+

You may also refer to the here for the implementation of human demonstrations in the AppAgent.

+
+

In the AppAgent, it calls the build_human_demonstration_retriever to build a human demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent.

+

Skill Set for Automation

+

The AppAgent is equipped with a versatile skill set to support comprehensive automation within the application by calling the create_puppeteer_interface method. The skills include:

+ + + + + + + + + + + + + + + + + + + + + +
SkillDescription
UI AutomationMimicking user interactions with the application UI controls using the UI Automation and Win32 API.
Native APIAccessing the application's native API to execute specific functions and actions.
In-App AgentLeveraging the in-app agent to interact with the application's internal functions and features.
+

By utilizing these skills, the AppAgent can efficiently interact with the application and fulfill the user's request. You can find more details in the Automator documentation and the code in the ufo/automator module.

+

Reference

+ + +
+ + + + +
+

+ Bases: BasicAgent

+ + +

The AppAgent class that manages the interaction with the application.

+ +

Initialize the AppAgent. +:name: The name of the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + process_name + (str) + – +
    +

    The process name of the app.

    +
    +
  • +
  • + app_root_name + (str) + – +
    +

    The root name of the app.

    +
    +
  • +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt file path.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt file path.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt file path.

    +
    +
  • +
  • + skip_prompter + (bool, default: + False +) + – +
    +

    The flag indicating whether to skip the prompter initialization.

    +
    +
  • +
+
+ + + + + +
+ Source code in agents/agent/app_agent.py +
28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
def __init__(
+    self,
+    name: str,
+    process_name: str,
+    app_root_name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+    skip_prompter: bool = False,
+) -> None:
+    """
+    Initialize the AppAgent.
+    :name: The name of the agent.
+    :param process_name: The process name of the app.
+    :param app_root_name: The root name of the app.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt file path.
+    :param example_prompt: The example prompt file path.
+    :param api_prompt: The API prompt file path.
+    :param skip_prompter: The flag indicating whether to skip the prompter initialization.
+    """
+    super().__init__(name=name)
+    if not skip_prompter:
+        self.prompter = self.get_prompter(
+            is_visual, main_prompt, example_prompt, api_prompt, app_root_name
+        )
+    self._process_name = process_name
+    self._app_root_name = app_root_name
+    self.offline_doc_retriever = None
+    self.online_doc_retriever = None
+    self.experience_retriever = None
+    self.human_demonstration_retriever = None
+
+    self.Puppeteer = self.create_puppeteer_interface()
+
+    self.set_state(ContinueAppAgentState())
+
+
+ + + +
+ + + + + + + +
+ + + +

+ status_manager: AppAgentStatus + + + property + + +

+ + +
+ +

Get the status manager.

+
+ +
+ + + +
+ + +

+ build_experience_retriever(db_path) + +

+ + +
+ +

Build the experience retriever.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + db_path + (str) + – +
    +

    The path to the experience database.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + None + – +
    +

    The experience retriever.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
346
+347
+348
+349
+350
+351
+352
+353
+354
def build_experience_retriever(self, db_path: str) -> None:
+    """
+    Build the experience retriever.
+    :param db_path: The path to the experience database.
+    :return: The experience retriever.
+    """
+    self.experience_retriever = self.retriever_factory.create_retriever(
+        "experience", db_path
+    )
+
+
+
+ +
+ +
+ + +

+ build_human_demonstration_retriever(db_path) + +

+ + +
+ +

Build the human demonstration retriever.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + db_path + (str) + – +
    +

    The path to the human demonstration database.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + None + – +
    +

    The human demonstration retriever.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
356
+357
+358
+359
+360
+361
+362
+363
+364
def build_human_demonstration_retriever(self, db_path: str) -> None:
+    """
+    Build the human demonstration retriever.
+    :param db_path: The path to the human demonstration database.
+    :return: The human demonstration retriever.
+    """
+    self.human_demonstration_retriever = self.retriever_factory.create_retriever(
+        "demonstration", db_path
+    )
+
+
+
+ +
+ +
+ + +

+ build_offline_docs_retriever() + +

+ + +
+ +

Build the offline docs retriever.

+ +
+ Source code in agents/agent/app_agent.py +
328
+329
+330
+331
+332
+333
+334
def build_offline_docs_retriever(self) -> None:
+    """
+    Build the offline docs retriever.
+    """
+    self.offline_doc_retriever = self.retriever_factory.create_retriever(
+        "offline", self._app_root_name
+    )
+
+
+
+ +
+ +
+ + +

+ build_online_search_retriever(request, top_k) + +

+ + +
+ +

Build the online search retriever.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    The request for online Bing search.

    +
    +
  • +
  • + top_k + (int) + – +
    +

    The number of documents to retrieve.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
336
+337
+338
+339
+340
+341
+342
+343
+344
def build_online_search_retriever(self, request: str, top_k: int) -> None:
+    """
+    Build the online search retriever.
+    :param request: The request for online Bing search.
+    :param top_k: The number of documents to retrieve.
+    """
+    self.online_doc_retriever = self.retriever_factory.create_retriever(
+        "online", request, top_k
+    )
+
+
+
+ +
+ +
+ + +

+ context_provision(request='') + +

+ + +
+ +

Provision the context for the app agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str, default: + '' +) + – +
    +

    The request sent to the Bing search retriever.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
+378
+379
+380
+381
+382
+383
+384
+385
+386
+387
+388
+389
+390
+391
+392
+393
+394
+395
+396
+397
+398
+399
+400
+401
+402
def context_provision(self, request: str = "") -> None:
+    """
+    Provision the context for the app agent.
+    :param request: The request sent to the Bing search retriever.
+    """
+
+    # Load the offline document indexer for the app agent if available.
+    if configs["RAG_OFFLINE_DOCS"]:
+        utils.print_with_color(
+            "Loading offline help document indexer for {app}...".format(
+                app=self._process_name
+            ),
+            "magenta",
+        )
+        self.build_offline_docs_retriever()
+
+    # Load the online search indexer for the app agent if available.
+
+    if configs["RAG_ONLINE_SEARCH"] and request:
+        utils.print_with_color("Creating a Bing search indexer...", "magenta")
+        self.build_online_search_retriever(
+            request, configs["RAG_ONLINE_SEARCH_TOPK"]
+        )
+
+    # Load the experience indexer for the app agent if available.
+    if configs["RAG_EXPERIENCE"]:
+        utils.print_with_color("Creating an experience indexer...", "magenta")
+        experience_path = configs["EXPERIENCE_SAVED_PATH"]
+        db_path = os.path.join(experience_path, "experience_db")
+        self.build_experience_retriever(db_path)
+
+    # Load the demonstration indexer for the app agent if available.
+    if configs["RAG_DEMONSTRATION"]:
+        utils.print_with_color("Creating an demonstration indexer...", "magenta")
+        demonstration_path = configs["DEMONSTRATION_SAVED_PATH"]
+        db_path = os.path.join(demonstration_path, "demonstration_db")
+        self.build_human_demonstration_retriever(db_path)
+
+
+
+ +
+ +
+ + +

+ create_puppeteer_interface() + +

+ + +
+ +

Create the Puppeteer interface to automate the app.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + AppPuppeteer + – +
    +

    The Puppeteer interface.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
299
+300
+301
+302
+303
+304
def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
+    """
+    Create the Puppeteer interface to automate the app.
+    :return: The Puppeteer interface.
+    """
+    return puppeteer.AppPuppeteer(self._process_name, self._app_root_name)
+
+
+
+ +
+ +
+ + +

+ external_knowledge_prompt_helper(request, offline_top_k, online_top_k) + +

+ + +
+ +

Retrieve the external knowledge and construct the prompt.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    The request.

    +
    +
  • +
  • + offline_top_k + (int) + – +
    +

    The number of offline documents to retrieve.

    +
    +
  • +
  • + online_top_k + (int) + – +
    +

    The number of online documents to retrieve.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The prompt message for the external_knowledge.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
200
+201
+202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
+240
+241
def external_knowledge_prompt_helper(
+    self, request: str, offline_top_k: int, online_top_k: int
+) -> str:
+    """
+    Retrieve the external knowledge and construct the prompt.
+    :param request: The request.
+    :param offline_top_k: The number of offline documents to retrieve.
+    :param online_top_k: The number of online documents to retrieve.
+    :return: The prompt message for the external_knowledge.
+    """
+
+    retrieved_docs = ""
+
+    # Retrieve offline documents and construct the prompt
+    if self.offline_doc_retriever:
+        offline_docs = self.offline_doc_retriever.retrieve(
+            "How to {query} for {app}".format(
+                query=request, app=self._process_name
+            ),
+            offline_top_k,
+            filter=None,
+        )
+        offline_docs_prompt = self.prompter.retrived_documents_prompt_helper(
+            "Help Documents",
+            "Document",
+            [doc.metadata["text"] for doc in offline_docs],
+        )
+        retrieved_docs += offline_docs_prompt
+
+    # Retrieve online documents and construct the prompt
+    if self.online_doc_retriever:
+        online_search_docs = self.online_doc_retriever.retrieve(
+            request, online_top_k, filter=None
+        )
+        online_docs_prompt = self.prompter.retrived_documents_prompt_helper(
+            "Online Search Results",
+            "Search Result",
+            [doc.page_content for doc in online_search_docs],
+        )
+        retrieved_docs += online_docs_prompt
+
+    return retrieved_docs
+
+
+
+ +
+ +
+ + +

+ get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name) + +

+ + +
+ +

Get the prompt for the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt file path.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt file path.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt file path.

    +
    +
  • +
  • + app_root_name + (str) + – +
    +

    The root name of the app.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + AppAgentPrompter + – +
    +

    The prompter instance.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
def get_prompter(
+    self,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+    app_root_name: str,
+) -> AppAgentPrompter:
+    """
+    Get the prompt for the agent.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt file path.
+    :param example_prompt: The example prompt file path.
+    :param api_prompt: The API prompt file path.
+    :param app_root_name: The root name of the app.
+    :return: The prompter instance.
+    """
+    return AppAgentPrompter(
+        is_visual, main_prompt, example_prompt, api_prompt, app_root_name
+    )
+
+
+
+ +
+ +
+ + +

+ message_constructor(dynamic_examples, dynamic_tips, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, host_message, include_last_screenshot) + +

+ + +
+ +

Construct the prompt message for the AppAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + dynamic_examples + (str) + – +
    +

    The dynamic examples retrieved from the self-demonstration and human demonstration.

    +
    +
  • +
  • + dynamic_tips + (str) + – +
    +

    The dynamic tips retrieved from the self-demonstration and human demonstration.

    +
    +
  • +
  • + dynamic_knowledge + (str) + – +
    +

    The dynamic knowledge retrieved from the external knowledge base.

    +
    +
  • +
  • + image_list + (List) + – +
    +

    The list of screenshot images.

    +
    +
  • +
  • + control_info + (str) + – +
    +

    The control information.

    +
    +
  • +
  • + plan + (List[str]) + – +
    +

    The plan list.

    +
    +
  • +
  • + request + (str) + – +
    +

    The overall user request.

    +
    +
  • +
  • + subtask + (str) + – +
    +

    The subtask for the current AppAgent to process.

    +
    +
  • +
  • + host_message + (List[str]) + – +
    +

    The message from the HostAgent.

    +
    +
  • +
  • + include_last_screenshot + (bool) + – +
    +

    The flag indicating whether to include the last screenshot.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, Union[str, List[Dict[str, str]]]]] + – +
    +

    The prompt message.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
def message_constructor(
+    self,
+    dynamic_examples: str,
+    dynamic_tips: str,
+    dynamic_knowledge: str,
+    image_list: List,
+    control_info: str,
+    prev_subtask: List[Dict[str, str]],
+    plan: List[str],
+    request: str,
+    subtask: str,
+    host_message: List[str],
+    include_last_screenshot: bool,
+) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
+    """
+    Construct the prompt message for the AppAgent.
+    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
+    :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration.
+    :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base.
+    :param image_list: The list of screenshot images.
+    :param control_info: The control information.
+    :param plan: The plan list.
+    :param request: The overall user request.
+    :param subtask: The subtask for the current AppAgent to process.
+    :param host_message: The message from the HostAgent.
+    :param include_last_screenshot: The flag indicating whether to include the last screenshot.
+    :return: The prompt message.
+    """
+    appagent_prompt_system_message = self.prompter.system_prompt_construction(
+        dynamic_examples, dynamic_tips
+    )
+
+    appagent_prompt_user_message = self.prompter.user_content_construction(
+        image_list=image_list,
+        control_item=control_info,
+        prev_subtask=prev_subtask,
+        prev_plan=plan,
+        user_request=request,
+        subtask=subtask,
+        current_application=self._process_name,
+        host_message=host_message,
+        retrieved_docs=dynamic_knowledge,
+        include_last_screenshot=include_last_screenshot,
+    )
+
+    if not self.blackboard.is_empty():
+
+        blackboard_prompt = self.blackboard.blackboard_to_prompt()
+        appagent_prompt_user_message = (
+            blackboard_prompt + appagent_prompt_user_message
+        )
+
+    appagent_prompt_message = self.prompter.prompt_construction(
+        appagent_prompt_system_message, appagent_prompt_user_message
+    )
+
+    return appagent_prompt_message
+
+
+
+ +
+ +
+ + +

+ print_response(response_dict) + +

+ + +
+ +

Print the response.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + response_dict + (Dict) + – +
    +

    The response dictionary to print.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
def print_response(self, response_dict: Dict) -> None:
+    """
+    Print the response.
+    :param response_dict: The response dictionary to print.
+    """
+
+    control_text = response_dict.get("ControlText")
+    control_label = response_dict.get("ControlLabel")
+    if not control_text and not control_label:
+        control_text = "[No control selected.]"
+        control_label = "[No control label selected.]"
+    observation = response_dict.get("Observation")
+    thought = response_dict.get("Thought")
+    plan = response_dict.get("Plan")
+    status = response_dict.get("Status")
+    comment = response_dict.get("Comment")
+    function_call = response_dict.get("Function")
+    args = utils.revise_line_breaks(response_dict.get("Args"))
+
+    # Generate the function call string
+    action = self.Puppeteer.get_command_string(function_call, args)
+
+    utils.print_with_color(
+        "Observations👀: {observation}".format(observation=observation), "cyan"
+    )
+    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
+    utils.print_with_color(
+        "Selected item🕹️: {control_text}, Label: {label}".format(
+            control_text=control_text, label=control_label
+        ),
+        "yellow",
+    )
+    utils.print_with_color(
+        "Action applied⚒️: {action}".format(action=action), "blue"
+    )
+    utils.print_with_color("Status📊: {status}".format(status=status), "blue")
+    utils.print_with_color(
+        "Next Plan📚: {plan}".format(plan="\n".join(plan)), "cyan"
+    )
+    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")
+
+    screenshot_saving = response_dict.get("SaveScreenshot", {})
+
+    if screenshot_saving.get("save", False):
+        utils.print_with_color(
+            "Notice: The current screenshot📸 is saved to the blackboard.",
+            "yellow",
+        )
+        utils.print_with_color(
+            "Saving reason: {reason}".format(
+                reason=screenshot_saving.get("reason")
+            ),
+            "yellow",
+        )
+
+
+
+ +
+ +
+ + +

+ process(context) + +

+ + +
+ +

Process the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + context + (Context) + – +
    +

    The context.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
290
+291
+292
+293
+294
+295
+296
+297
def process(self, context: Context) -> None:
+    """
+    Process the agent.
+    :param context: The context.
+    """
+    self.processor = AppAgentProcessor(agent=self, context=context)
+    self.processor.process()
+    self.status = self.processor.status
+
+
+
+ +
+ +
+ + +

+ process_comfirmation() + +

+ + +
+ +

Process the user confirmation.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    The decision.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+317
+318
+319
def process_comfirmation(self) -> bool:
+    """
+    Process the user confirmation.
+    :return: The decision.
+    """
+    action = self.processor.action
+    control_text = self.processor.control_text
+
+    decision = interactor.sensitive_step_asker(action, control_text)
+
+    if not decision:
+        utils.print_with_color("The user has canceled the action.", "red")
+
+    return decision
+
+
+
+ +
+ +
+ + +

+ rag_demonstration_retrieve(request, demonstration_top_k) + +

+ + +
+ +

Retrieving demonstration examples for the user request.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    The user request.

    +
    +
  • +
  • + demonstration_top_k + (int) + – +
    +

    The number of documents to retrieve.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The retrieved examples and tips string.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
def rag_demonstration_retrieve(self, request: str, demonstration_top_k: int) -> str:
+    """
+    Retrieving demonstration examples for the user request.
+    :param request: The user request.
+    :param demonstration_top_k: The number of documents to retrieve.
+    :return: The retrieved examples and tips string.
+    """
+
+    # Retrieve demonstration examples.
+    demonstration_docs = self.human_demonstration_retriever.retrieve(
+        request, demonstration_top_k
+    )
+
+    if demonstration_docs:
+        examples = [doc.metadata.get("example", {}) for doc in demonstration_docs]
+        tips = [doc.metadata.get("Tips", "") for doc in demonstration_docs]
+    else:
+        examples = []
+        tips = []
+
+    return examples, tips
+
+
+
+ +
+ +
+ + +

+ rag_experience_retrieve(request, experience_top_k) + +

+ + +
+ +

Retrieving experience examples for the user request.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    The user request.

    +
    +
  • +
  • + experience_top_k + (int) + – +
    +

    The number of documents to retrieve.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The retrieved examples and tips string.

    +
    +
  • +
+
+
+ Source code in agents/agent/app_agent.py +
243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
def rag_experience_retrieve(self, request: str, experience_top_k: int) -> str:
+    """
+    Retrieving experience examples for the user request.
+    :param request: The user request.
+    :param experience_top_k: The number of documents to retrieve.
+    :return: The retrieved examples and tips string.
+    """
+
+    # Retrieve experience examples. Only retrieve the examples that are related to the current application.
+    experience_docs = self.experience_retriever.retrieve(
+        request,
+        experience_top_k,
+        filter=lambda x: self._app_root_name.lower()
+        in [app.lower() for app in x["app_list"]],
+    )
+
+    if experience_docs:
+        examples = [doc.metadata.get("example", {}) for doc in experience_docs]
+        tips = [doc.metadata.get("Tips", "") for doc in experience_docs]
+    else:
+        examples = []
+        tips = []
+
+    return examples, tips
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/design/blackboard/index.html b/agents/design/blackboard/index.html new file mode 100644 index 00000000..ee78a146 --- /dev/null +++ b/agents/design/blackboard/index.html @@ -0,0 +1,1830 @@ + + + + + + + + Blackboard - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Agent Blackboard

+

The Blackboard is a shared memory space that is visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The Blackboard is a key component of the UFO framework, enabling agents to share information and collaborate to fulfill user requests. The Blackboard is implemented as a class in the ufo/agents/memory/blackboard.py file.

+

Components

+

The Blackboard consists of the following data components:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
ComponentDescription
questionsA list of questions that UFO asks the user, along with their corresponding answers.
requestsA list of historical user requests received in previous Round.
trajectoriesA list of step-wise trajectories that record the agent's actions and decisions at each step.
screenshotsA list of screenshots taken by the agent when it believes the current state is important for future reference.
+
+

Tip

+

The keys stored in the trajectories are configured as HISTORY_KEYS in the config_dev.yaml file. You can customize the keys based on your requirements and the agent's logic.

+
+
+

Tip

+

Whether to save the screenshots is determined by the AppAgent. You can enable or disable screenshot capture by setting the SCREENSHOT_TO_MEMORY flag in the config_dev.yaml file.

+
+

Blackboard to Prompt

+

Data in the Blackboard is based on the MemoryItem class. It has a method blackboard_to_prompt that converts the information stored in the Blackboard to a string prompt. Agents call this method to construct the prompt for the LLM's inference. The blackboard_to_prompt method is defined as follows:

+
def blackboard_to_prompt(self) -> List[str]:
+    """
+    Convert the blackboard to a prompt.
+    :return: The prompt.
+    """
+    prefix = [
+        {
+            "type": "text",
+            "text": "[Blackboard:]",
+        }
+    ]
+
+    blackboard_prompt = (
+        prefix
+        + self.texts_to_prompt(self.questions, "[Questions & Answers:]")
+        + self.texts_to_prompt(self.requests, "[Request History:]")
+        + self.texts_to_prompt(self.trajectories, "[Step Trajectories Completed Previously:]")
+        + self.screenshots_to_prompt()
+    )
+
+    return blackboard_prompt
+
+

Reference

+ + +
+ + + + +
+ + +

Class for the blackboard, which stores the data and images which are visible to all the agents.

+ +

Initialize the blackboard.

+ + + + + + +
+ Source code in agents/memory/blackboard.py +
41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
def __init__(self) -> None:
+    """
+    Initialize the blackboard.
+    """
+    self._questions: Memory = Memory()
+    self._requests: Memory = Memory()
+    self._trajectories: Memory = Memory()
+    self._screenshots: Memory = Memory()
+
+    if configs.get("USE_CUSTOMIZATION", False):
+        self.load_questions(
+            configs.get("QA_PAIR_FILE", ""), configs.get("QA_PAIR_NUM", -1)
+        )
+
+
+ + + +
+ + + + + + + +
+ + + +

+ questions: Memory + + + property + + +

+ + +
+ +

Get the data from the blackboard.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Memory + – +
    +

    The questions from the blackboard.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ requests: Memory + + + property + + +

+ + +
+ +

Get the data from the blackboard.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Memory + – +
    +

    The requests from the blackboard.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ screenshots: Memory + + + property + + +

+ + +
+ +

Get the images from the blackboard.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Memory + – +
    +

    The images from the blackboard.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ trajectories: Memory + + + property + + +

+ + +
+ +

Get the data from the blackboard.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Memory + – +
    +

    The trajectories from the blackboard.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ add_data(data, memory) + +

+ + +
+ +

Add the data to the a memory in the blackboard.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + data + (Union[MemoryItem, Dict[str, str], str]) + – +
    +

    The data to be added. It can be a dictionary or a MemoryItem or a string.

    +
    +
  • +
  • + memory + (Memory) + – +
    +

    The memory to add the data to.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
def add_data(
+    self, data: Union[MemoryItem, Dict[str, str], str], memory: Memory
+) -> None:
+    """
+    Add the data to the a memory in the blackboard.
+    :param data: The data to be added. It can be a dictionary or a MemoryItem or a string.
+    :param memory: The memory to add the data to.
+    """
+
+    if isinstance(data, dict):
+        data_memory = MemoryItem()
+        data_memory.add_values_from_dict(data)
+        memory.add_memory_item(data_memory)
+    elif isinstance(data, MemoryItem):
+        memory.add_memory_item(data)
+    elif isinstance(data, str):
+        data_memory = MemoryItem()
+        data_memory.add_values_from_dict({"text": data})
+        memory.add_memory_item(data_memory)
+
+
+
+ +
+ +
+ + +

+ add_image(screenshot_path='', metadata=None) + +

+ + +
+ +

Add the image to the blackboard.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + screenshot_path + (str, default: + '' +) + – +
    +

    The path of the image.

    +
    +
  • +
  • + metadata + (Optional[Dict[str, str]], default: + None +) + – +
    +

    The metadata of the image.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
def add_image(
+    self,
+    screenshot_path: str = "",
+    metadata: Optional[Dict[str, str]] = None,
+) -> None:
+    """
+    Add the image to the blackboard.
+    :param screenshot_path: The path of the image.
+    :param metadata: The metadata of the image.
+    """
+
+    if os.path.exists(screenshot_path):
+
+        screenshot_str = PhotographerFacade().encode_image_from_path(
+            screenshot_path
+        )
+    else:
+        print(f"Screenshot path {screenshot_path} does not exist.")
+        screenshot_str = ""
+
+    image_memory_item = ImageMemoryItem()
+    image_memory_item.add_values_from_dict(
+        {
+            ImageMemoryItemNames.METADATA: metadata.get(
+                ImageMemoryItemNames.METADATA
+            ),
+            ImageMemoryItemNames.IMAGE_PATH: screenshot_path,
+            ImageMemoryItemNames.IMAGE_STR: screenshot_str,
+        }
+    )
+
+    self.screenshots.add_memory_item(image_memory_item)
+
+
+
+ +
+ +
+ + +

+ add_questions(questions) + +

+ + +
+ +

Add the data to the blackboard.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + questions + (Union[MemoryItem, Dict[str, str]]) + – +
    +

    The data to be added. It can be a dictionary or a MemoryItem or a string.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
107
+108
+109
+110
+111
+112
+113
def add_questions(self, questions: Union[MemoryItem, Dict[str, str]]) -> None:
+    """
+    Add the data to the blackboard.
+    :param questions: The data to be added. It can be a dictionary or a MemoryItem or a string.
+    """
+
+    self.add_data(questions, self.questions)
+
+
+
+ +
+ +
+ + +

+ add_requests(requests) + +

+ + +
+ +

Add the data to the blackboard.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + requests + (Union[MemoryItem, Dict[str, str]]) + – +
    +

    The data to be added. It can be a dictionary or a MemoryItem or a string.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
115
+116
+117
+118
+119
+120
+121
def add_requests(self, requests: Union[MemoryItem, Dict[str, str]]) -> None:
+    """
+    Add the data to the blackboard.
+    :param requests: The data to be added. It can be a dictionary or a MemoryItem or a string.
+    """
+
+    self.add_data(requests, self.requests)
+
+
+
+ +
+ +
+ + +

+ add_trajectories(trajectories) + +

+ + +
+ +

Add the data to the blackboard.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + trajectories + (Union[MemoryItem, Dict[str, str]]) + – +
    +

    The data to be added. It can be a dictionary or a MemoryItem or a string.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
123
+124
+125
+126
+127
+128
+129
def add_trajectories(self, trajectories: Union[MemoryItem, Dict[str, str]]) -> None:
+    """
+    Add the data to the blackboard.
+    :param trajectories: The data to be added. It can be a dictionary or a MemoryItem or a string.
+    """
+
+    self.add_data(trajectories, self.trajectories)
+
+
+
+ +
+ +
+ + +

+ blackboard_to_prompt() + +

+ + +
+ +

Convert the blackboard to a prompt.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The prompt.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
def blackboard_to_prompt(self) -> List[str]:
+    """
+    Convert the blackboard to a prompt.
+    :return: The prompt.
+    """
+    prefix = [
+        {
+            "type": "text",
+            "text": "[Blackboard:]",
+        }
+    ]
+
+    blackboard_prompt = (
+        prefix
+        + self.texts_to_prompt(self.questions, "[Questions & Answers:]")
+        + self.texts_to_prompt(self.requests, "[Request History:]")
+        + self.texts_to_prompt(
+            self.trajectories, "[Step Trajectories Completed Previously:]"
+        )
+        + self.screenshots_to_prompt()
+    )
+
+    return blackboard_prompt
+
+
+
+ +
+ +
+ + +

+ clear() + +

+ + +
+ +

Clear the blackboard.

+ +
+ Source code in agents/memory/blackboard.py +
277
+278
+279
+280
+281
+282
+283
+284
def clear(self) -> None:
+    """
+    Clear the blackboard.
+    """
+    self.questions.clear()
+    self.requests.clear()
+    self.trajectories.clear()
+    self.screenshots.clear()
+
+
+
+ +
+ +
+ + +

+ is_empty() + +

+ + +
+ +

Check if the blackboard is empty.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    True if the blackboard is empty, False otherwise.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
def is_empty(self) -> bool:
+    """
+    Check if the blackboard is empty.
+    :return: True if the blackboard is empty, False otherwise.
+    """
+    return (
+        self.questions.is_empty()
+        and self.requests.is_empty()
+        and self.trajectories.is_empty()
+        and self.screenshots.is_empty()
+    )
+
+
+
+ +
+ +
+ + +

+ load_questions(file_path, last_k=-1) + +

+ + +
+ +

Load the data from a file.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + file_path + (str) + – +
    +

    The path of the file.

    +
    +
  • +
  • + last_k + – +
    +

    The number of lines to read from the end of the file. If -1, read all lines.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
192
+193
+194
+195
+196
+197
+198
+199
+200
def load_questions(self, file_path: str, last_k=-1) -> None:
+    """
+    Load the data from a file.
+    :param file_path: The path of the file.
+    :param last_k: The number of lines to read from the end of the file. If -1, read all lines.
+    """
+    qa_list = self.read_json_file(file_path, last_k)
+    for qa in qa_list:
+        self.add_questions(qa)
+
+
+
+ +
+ +
+ + +

+ questions_to_json() + +

+ + +
+ +

Convert the data to a dictionary.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The data in the dictionary format.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
164
+165
+166
+167
+168
+169
def questions_to_json(self) -> str:
+    """
+    Convert the data to a dictionary.
+    :return: The data in the dictionary format.
+    """
+    return self.questions.to_json()
+
+
+
+ +
+ +
+ + +

+ read_json_file(file_path, last_k=-1) + + + staticmethod + + +

+ + +
+ +

Read the json file.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + file_path + (str) + – +
    +

    The path of the file.

    +
    +
  • +
  • + last_k + – +
    +

    The number of lines to read from the end of the file. If -1, read all lines.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, str] + – +
    +

    The data in the file.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
286
+287
+288
+289
+290
+291
+292
+293
+294
+295
+296
+297
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
+312
+313
+314
+315
@staticmethod
+def read_json_file(file_path: str, last_k=-1) -> Dict[str, str]:
+    """
+    Read the json file.
+    :param file_path: The path of the file.
+    :param last_k: The number of lines to read from the end of the file. If -1, read all lines.
+    :return: The data in the file.
+    """
+
+    data_list = []
+
+    # Check if the file exists
+    if os.path.exists(file_path):
+        # Open the file and read the lines
+        with open(file_path, "r", encoding="utf-8") as file:
+            lines = file.readlines()
+
+        # If last_k is not -1, only read the last k lines
+        if last_k != -1:
+            lines = lines[-last_k:]
+
+        # Parse the lines as JSON
+        for line in lines:
+            try:
+                data = json.loads(line.strip())
+                data_list.append(data)
+            except json.JSONDecodeError:
+                print(f"Warning: Unable to parse line as JSON: {line}")
+
+    return data_list
+
+
+
+ +
+ +
+ + +

+ requests_to_json() + +

+ + +
+ +

Convert the data to a dictionary.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The data in the dictionary format.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
171
+172
+173
+174
+175
+176
def requests_to_json(self) -> str:
+    """
+    Convert the data to a dictionary.
+    :return: The data in the dictionary format.
+    """
+    return self.requests.to_json()
+
+
+
+ +
+ +
+ + +

+ screenshots_to_json() + +

+ + +
+ +

Convert the images to a dictionary.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The images in the dictionary format.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
185
+186
+187
+188
+189
+190
def screenshots_to_json(self) -> str:
+    """
+    Convert the images to a dictionary.
+    :return: The images in the dictionary format.
+    """
+    return self.screenshots.to_json()
+
+
+
+ +
+ +
+ + +

+ screenshots_to_prompt() + +

+ + +
+ +

Convert the images to a prompt.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The prompt.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
214
+215
+216
+217
+218
+219
+220
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
def screenshots_to_prompt(self) -> List[str]:
+    """
+    Convert the images to a prompt.
+    :return: The prompt.
+    """
+
+    user_content = []
+    for screenshot_dict in self.screenshots.list_content:
+        user_content.append(
+            {
+                "type": "text",
+                "text": json.dumps(
+                    screenshot_dict.get(ImageMemoryItemNames.METADATA, "")
+                ),
+            }
+        )
+        user_content.append(
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": screenshot_dict.get(ImageMemoryItemNames.IMAGE_STR, "")
+                },
+            }
+        )
+
+    return user_content
+
+
+
+ +
+ +
+ + +

+ texts_to_prompt(memory, prefix) + +

+ + +
+ +

Convert the data to a prompt.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The prompt.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
def texts_to_prompt(self, memory: Memory, prefix: str) -> List[str]:
+    """
+    Convert the data to a prompt.
+    :return: The prompt.
+    """
+
+    user_content = [
+        {"type": "text", "text": f"{prefix}\n {json.dumps(memory.list_content)}"}
+    ]
+
+    return user_content
+
+
+
+ +
+ +
+ + +

+ trajectories_to_json() + +

+ + +
+ +

Convert the data to a dictionary.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The data in the dictionary format.

    +
    +
  • +
+
+
+ Source code in agents/memory/blackboard.py +
178
+179
+180
+181
+182
+183
def trajectories_to_json(self) -> str:
+    """
+    Convert the data to a dictionary.
+    :return: The data in the dictionary format.
+    """
+    return self.trajectories.to_json()
+
+
+
+ +
+ + + +
+ +
+ +
+

Note

+

You can customize the class to tailor the Blackboard to your requirements.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/design/memory/index.html b/agents/design/memory/index.html new file mode 100644 index 00000000..6b685edc --- /dev/null +++ b/agents/design/memory/index.html @@ -0,0 +1,1657 @@ + + + + + + + + Memory - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Agent Memory

+

The Memory manages the memory of the agent and stores the information required for the agent to interact with the user and applications at every step. Parts of elements in the Memory will be visible to the agent for decision-making.

+

MemoryItem

+

A MemoryItem is a dataclass that represents a single step in the agent's memory. The fields of a MemoryItem is flexible and can be customized based on the requirements of the agent. The MemoryItem class is defined as follows:

+ + +
+ + + + +
+ + +

This data class represents a memory item of an agent at one step.

+ + + + + + + + + +
+ + + + + + + +
+ + + +

+ attributes: List[str] + + + property + + +

+ + +
+ +

Get the attributes of the memory item.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The attributes.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ add_values_from_dict(values) + +

+ + +
+ +

Add fields to the memory item.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + values + (Dict[str, Any]) + – +
    +

    The values of the fields.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
57
+58
+59
+60
+61
+62
+63
def add_values_from_dict(self, values: Dict[str, Any]) -> None:
+    """
+    Add fields to the memory item.
+    :param values: The values of the fields.
+    """
+    for key, value in values.items():
+        self.set_value(key, value)
+
+
+
+ +
+ +
+ + +

+ filter(keys=[]) + +

+ + +
+ +

Fetch the memory item.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + keys + (List[str], default: + [] +) + – +
    +

    The keys to fetch.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + None + – +
    +

    The filtered memory item.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
37
+38
+39
+40
+41
+42
+43
+44
def filter(self, keys: List[str] = []) -> None:
+    """
+    Fetch the memory item.
+    :param keys: The keys to fetch.
+    :return: The filtered memory item.
+    """
+
+    return {key: value for key, value in self.to_dict().items() if key in keys}
+
+
+
+ +
+ +
+ + +

+ get_value(key) + +

+ + +
+ +

Get the value of the field.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + key + (str) + – +
    +

    The key of the field.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Optional[str] + – +
    +

    The value of the field.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
65
+66
+67
+68
+69
+70
+71
+72
def get_value(self, key: str) -> Optional[str]:
+    """
+    Get the value of the field.
+    :param key: The key of the field.
+    :return: The value of the field.
+    """
+
+    return getattr(self, key, None)
+
+
+
+ +
+ +
+ + +

+ get_values(keys) + +

+ + +
+ +

Get the values of the fields.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + keys + (List[str]) + – +
    +

    The keys of the fields.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + dict + – +
    +

    The values of the fields.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
74
+75
+76
+77
+78
+79
+80
def get_values(self, keys: List[str]) -> dict:
+    """
+    Get the values of the fields.
+    :param keys: The keys of the fields.
+    :return: The values of the fields.
+    """
+    return {key: self.get_value(key) for key in keys}
+
+
+
+ +
+ +
+ + +

+ set_value(key, value) + +

+ + +
+ +

Add a field to the memory item.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + key + (str) + – +
    +

    The key of the field.

    +
    +
  • +
  • + value + (str) + – +
    +

    The value of the field.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
46
+47
+48
+49
+50
+51
+52
+53
+54
+55
def set_value(self, key: str, value: str) -> None:
+    """
+    Add a field to the memory item.
+    :param key: The key of the field.
+    :param value: The value of the field.
+    """
+    setattr(self, key, value)
+
+    if key not in self._memory_attributes:
+        self._memory_attributes.append(key)
+
+
+
+ +
+ +
+ + +

+ to_dict() + +

+ + +
+ +

Convert the MemoryItem to a dictionary.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, str] + – +
    +

    The dictionary.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
19
+20
+21
+22
+23
+24
+25
+26
+27
+28
def to_dict(self) -> Dict[str, str]:
+    """
+    Convert the MemoryItem to a dictionary.
+    :return: The dictionary.
+    """
+    return {
+        key: value
+        for key, value in self.__dict__.items()
+        if key in self._memory_attributes
+    }
+
+
+
+ +
+ +
+ + +

+ to_json() + +

+ + +
+ +

Convert the memory item to a JSON string.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The JSON string.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
30
+31
+32
+33
+34
+35
def to_json(self) -> str:
+    """
+    Convert the memory item to a JSON string.
+    :return: The JSON string.
+    """
+    return json.dumps(self.to_dict())
+
+
+
+ +
+ + + +
+ +
+ +
+

Info

+

At each step, an instance of MemoryItem is created and stored in the Memory to record the information of the agent's interaction with the user and applications.

+
+

Memory

+

The Memory class is responsible for managing the memory of the agent. It stores a list of MemoryItem instances that represent the agent's memory at each step. The Memory class is defined as follows:

+ + +
+ + + + +
+ + +

This data class represents a memory of an agent.

+ + + + + + + + + +
+ + + + + + + +
+ + + +

+ content: List[MemoryItem] + + + property + + +

+ + +
+ +

Get the content of the memory.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[MemoryItem] + – +
    +

    The content of the memory.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ length: int + + + property + + +

+ + +
+ +

Get the length of the memory.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The length of the memory.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ list_content: List[Dict[str, str]] + + + property + + +

+ + +
+ +

List the content of the memory.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, str]] + – +
    +

    The content of the memory.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ add_memory_item(memory_item) + +

+ + +
+ +

Add a memory item to the memory.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + memory_item + (MemoryItem) + – +
    +

    The memory item to add.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
122
+123
+124
+125
+126
+127
def add_memory_item(self, memory_item: MemoryItem) -> None:
+    """
+    Add a memory item to the memory.
+    :param memory_item: The memory item to add.
+    """
+    self._content.append(memory_item)
+
+
+
+ +
+ +
+ + +

+ clear() + +

+ + +
+ +

Clear the memory.

+ +
+ Source code in agents/memory/memory.py +
129
+130
+131
+132
+133
def clear(self) -> None:
+    """
+    Clear the memory.
+    """
+    self._content = []
+
+
+
+ +
+ +
+ + +

+ delete_memory_item(step) + +

+ + +
+ +

Delete a memory item from the memory.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + step + (int) + – +
    +

    The step of the memory item to delete.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
143
+144
+145
+146
+147
+148
def delete_memory_item(self, step: int) -> None:
+    """
+    Delete a memory item from the memory.
+    :param step: The step of the memory item to delete.
+    """
+    self._content = [item for item in self._content if item.step != step]
+
+
+
+ +
+ +
+ + +

+ filter_memory_from_keys(keys) + +

+ + +
+ +

Filter the memory from the keys. If an item does not have the key, the key will be ignored.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + keys + (List[str]) + – +
    +

    The keys to filter.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, str]] + – +
    +

    The filtered memory.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
114
+115
+116
+117
+118
+119
+120
def filter_memory_from_keys(self, keys: List[str]) -> List[Dict[str, str]]:
+    """
+    Filter the memory from the keys. If an item does not have the key, the key will be ignored.
+    :param keys: The keys to filter.
+    :return: The filtered memory.
+    """
+    return [item.filter(keys) for item in self._content]
+
+
+
+ +
+ +
+ + +

+ filter_memory_from_steps(steps) + +

+ + +
+ +

Filter the memory from the steps.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + steps + (List[int]) + – +
    +

    The steps to filter.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, str]] + – +
    +

    The filtered memory.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
106
+107
+108
+109
+110
+111
+112
def filter_memory_from_steps(self, steps: List[int]) -> List[Dict[str, str]]:
+    """
+    Filter the memory from the steps.
+    :param steps: The steps to filter.
+    :return: The filtered memory.
+    """
+    return [item.to_dict() for item in self._content if item.step in steps]
+
+
+
+ +
+ +
+ + +

+ get_latest_item() + +

+ + +
+ +

Get the latest memory item.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + MemoryItem + – +
    +

    The latest memory item.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
160
+161
+162
+163
+164
+165
+166
+167
def get_latest_item(self) -> MemoryItem:
+    """
+    Get the latest memory item.
+    :return: The latest memory item.
+    """
+    if self.length == 0:
+        return None
+    return self._content[-1]
+
+
+
+ +
+ +
+ + +

+ is_empty() + +

+ + +
+ +

Check if the memory is empty.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    The boolean value indicating if the memory is empty.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
185
+186
+187
+188
+189
+190
def is_empty(self) -> bool:
+    """
+    Check if the memory is empty.
+    :return: The boolean value indicating if the memory is empty.
+    """
+    return self.length == 0
+
+
+
+ +
+ +
+ + +

+ load(content) + +

+ + +
+ +

Load the data from the memory.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + content + (List[MemoryItem]) + – +
    +

    The content to load.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
 99
+100
+101
+102
+103
+104
def load(self, content: List[MemoryItem]) -> None:
+    """
+    Load the data from the memory.
+    :param content: The content to load.
+    """
+    self._content = content
+
+
+
+ +
+ +
+ + +

+ to_json() + +

+ + +
+ +

Convert the memory to a JSON string.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The JSON string.

    +
    +
  • +
+
+
+ Source code in agents/memory/memory.py +
150
+151
+152
+153
+154
+155
+156
+157
+158
def to_json(self) -> str:
+    """
+    Convert the memory to a JSON string.
+    :return: The JSON string.
+    """
+
+    return json.dumps(
+        [item.to_dict() for item in self._content if item is not None]
+    )
+
+
+
+ +
+ + + +
+ +
+ +
+

Info

+

Each agent has its own Memory instance to store their information.

+
+
+

Info

+

Not all information in the Memory are provided to the agent for decision-making. The agent can access parts of the memory based on the requirements of the agent's logic.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/design/processor/index.html b/agents/design/processor/index.html new file mode 100644 index 00000000..685aa689 --- /dev/null +++ b/agents/design/processor/index.html @@ -0,0 +1,3293 @@ + + + + + + + + Processor - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Agents Processor

+

The Processor is a key component of the agent to process the core logic of the agent to process the user's request. The Processor is implemented as a class in the ufo/agents/processors folder. Each agent has its own Processor class withing the folder.

+

Core Process

+

Once called, an agent follows a series of steps to process the user's request defined in the Processor class by calling the process method. The workflow of the process is as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StepDescriptionFunction
1Print the step information.print_step_info
2Capture the screenshot of the application.capture_screenshot
3Get the control information of the application.get_control_info
4Get the prompt message for the LLM.get_prompt_message
5Generate the response from the LLM.get_response
6Update the cost of the step.update_cost
7Parse the response from the LLM.parse_response
8Execute the action based on the response.execute_action
9Update the memory and blackboard.update_memory
10Update the status of the agent.update_status
+

At each step, the Processor processes the user's request by invoking the corresponding method sequentially to execute the necessary actions.

+

The process may be paused. It can be resumed, based on the agent's logic and the user's request using the resume method.

+

Reference

+

Below is the basic structure of the Processor class: +

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

The base processor for the session. A session consists of multiple rounds of conversation with the user, completing a task. +At each round, the HostAgent and AppAgent interact with the user and the application with the processor. +Each processor is responsible for processing the user request and updating the HostAgent and AppAgent at a single step in a round.

+ +

Initialize the processor.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + context + (Context) + – +
    +

    The context of the session.

    +
    +
  • +
  • + agent + (BasicAgent) + – +
    +

    The agent who executes the processor.

    +
    +
  • +
+
+ + + + + +
+ Source code in agents/processors/basic.py +
35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
def __init__(self, agent: BasicAgent, context: Context) -> None:
+    """
+    Initialize the processor.
+    :param context: The context of the session.
+    :param agent: The agent who executes the processor.
+    """
+
+    self._context = context
+    self._agent = agent
+
+    self.photographer = PhotographerFacade()
+    self.control_inspector = ControlInspectorFacade(BACKEND)
+
+    self._prompt_message = None
+    self._status = None
+    self._response = None
+    self._cost = 0
+    self._control_label = None
+    self._control_text = None
+    self._response_json = {}
+    self._memory_data = MemoryItem()
+    self._results = None
+    self._question_list = []
+    self._agent_status_manager = self.agent.status_manager
+    self._is_resumed = False
+    self._action = None
+    self._plan = None
+
+    self._control_log = {
+        "control_class": None,
+        "control_type": None,
+        "control_automation_id": None,
+    }
+
+    self._total_time_cost = 0
+    self._time_cost = {}
+    self._exeception_traceback = {}
+
+
+ + + +
+ + + + + + + +
+ + + +

+ action: str + + + property + writable + + +

+ + +
+ +

Get the action.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The action.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ agent: BasicAgent + + + property + + +

+ + +
+ +

Get the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + BasicAgent + – +
    +

    The agent.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ app_root: str + + + property + writable + + +

+ + +
+ +

Get the application root.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The application root.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ application_process_name: str + + + property + writable + + +

+ + +
+ +

Get the application process name.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The application process name.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ application_window: UIAWrapper + + + property + writable + + +

+ + +
+ +

Get the active window.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + UIAWrapper + – +
    +

    The active window.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ context: Context + + + property + + +

+ + +
+ +

Get the context.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Context + – +
    +

    The context.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ control_label: str + + + property + writable + + +

+ + +
+ +

Get the control label.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The control label.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ control_reannotate: List[str] + + + property + writable + + +

+ + +
+ +

Get the control reannotation.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The control reannotation.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ control_text: str + + + property + writable + + +

+ + +
+ +

Get the active application.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The active application.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ cost: float + + + property + writable + + +

+ + +
+ +

Get the cost of the processor.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + float + – +
    +

    The cost of the processor.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ host_message: List[str] + + + property + writable + + +

+ + +
+ +

Get the host message.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The host message.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ log_path: str + + + property + + +

+ + +
+ +

Get the log path.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The log path.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ logger: str + + + property + + +

+ + +
+ +

Get the logger.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The logger.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ name: str + + + property + + +

+ + +
+ +

Get the name of the processor.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The name of the processor.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ plan: str + + + property + writable + + +

+ + +
+ +

Get the plan of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The plan.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ prev_plan: List[str] + + + property + + +

+ + +
+ +

Get the previous plan.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The previous plan of the agent.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ previous_subtasks: List[str] + + + property + writable + + +

+ + +
+ +

Get the previous subtasks.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The previous subtasks.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ question_list: List[str] + + + property + writable + + +

+ + +
+ +

Get the question list.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The question list.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ request: str + + + property + + +

+ + +
+ +

Get the request.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The request.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ request_logger: str + + + property + + +

+ + +
+ +

Get the request logger.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The request logger.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ round_cost: float + + + property + writable + + +

+ + +
+ +

Get the round cost.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + float + – +
    +

    The round cost.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ round_num: int + + + property + + +

+ + +
+ +

Get the round number.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The round number.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ round_step: int + + + property + writable + + +

+ + +
+ +

Get the round step.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The round step.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ round_subtask_amount: int + + + property + + +

+ + +
+ +

Get the round subtask amount.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The round subtask amount.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ session_cost: float + + + property + writable + + +

+ + +
+ +

Get the session cost.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + float + – +
    +

    The session cost.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ session_step: int + + + property + writable + + +

+ + +
+ +

Get the session step.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The session step.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ status: str + + + property + writable + + +

+ + +
+ +

Get the status of the processor.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The status of the processor.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ subtask: str + + + property + writable + + +

+ + +
+ +

Get the subtask.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The subtask.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ ui_tree_path: str + + + property + + +

+ + +
+ +

Get the UI tree path.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The UI tree path.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ add_to_memory(data_dict) + +

+ + +
+ +

Add the data to the memory.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + data_dict + (Dict[str, Any]) + – +
    +

    The data dictionary to be added to the memory.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
297
+298
+299
+300
+301
+302
def add_to_memory(self, data_dict: Dict[str, Any]) -> None:
+    """
+    Add the data to the memory.
+    :param data_dict: The data dictionary to be added to the memory.
+    """
+    self._memory_data.add_values_from_dict(data_dict)
+
+
+
+ +
+ +
+ + +

+ capture_screenshot() + + + abstractmethod + + +

+ + +
+ +

Capture the screenshot.

+ +
+ Source code in agents/processors/basic.py +
235
+236
+237
+238
+239
+240
@abstractmethod
+def capture_screenshot(self) -> None:
+    """
+    Capture the screenshot.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ exception_capture(func) + + + classmethod + + +

+ + +
+ +

Decorator to capture the exception of the method.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + func + – +
    +

    The method to be decorated.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The decorated method.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
@classmethod
+def exception_capture(cls, func):
+    """
+    Decorator to capture the exception of the method.
+    :param func: The method to be decorated.
+    :return: The decorated method.
+    """
+
+    @wraps(func)
+    def wrapper(self, *args, **kwargs):
+        try:
+            func(self, *args, **kwargs)
+        except Exception as e:
+            self._exeception_traceback[func.__name__] = {
+                "type": str(type(e).__name__),
+                "message": str(e),
+                "traceback": traceback.format_exc(),
+            }
+
+            utils.print_with_color(f"Error Occurs at {func.__name__}", "red")
+            utils.print_with_color(
+                self._exeception_traceback[func.__name__]["traceback"], "red"
+            )
+            if self._response is not None:
+                utils.print_with_color("Response: ", "red")
+                utils.print_with_color(self._response, "red")
+            self._status = self._agent_status_manager.ERROR.value
+            self.sync_memory()
+            self.add_to_memory({"error": self._exeception_traceback})
+            self.add_to_memory({"Status": self._status})
+            self.log_save()
+
+            raise StopIteration("Error occurred during step.")
+
+    return wrapper
+
+
+
+ +
+ +
+ + +

+ execute_action() + + + abstractmethod + + +

+ + +
+ +

Execute the action.

+ +
+ Source code in agents/processors/basic.py +
270
+271
+272
+273
+274
+275
@abstractmethod
+def execute_action(self) -> None:
+    """
+    Execute the action.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ get_control_info() + + + abstractmethod + + +

+ + +
+ +

Get the control information.

+ +
+ Source code in agents/processors/basic.py +
242
+243
+244
+245
+246
+247
@abstractmethod
+def get_control_info(self) -> None:
+    """
+    Get the control information.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ get_prompt_message() + + + abstractmethod + + +

+ + +
+ +

Get the prompt message.

+ +
+ Source code in agents/processors/basic.py +
249
+250
+251
+252
+253
+254
@abstractmethod
+def get_prompt_message(self) -> None:
+    """
+    Get the prompt message.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ get_response() + + + abstractmethod + + +

+ + +
+ +

Get the response from the LLM.

+ +
+ Source code in agents/processors/basic.py +
256
+257
+258
+259
+260
+261
@abstractmethod
+def get_response(self) -> None:
+    """
+    Get the response from the LLM.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ is_confirm() + +

+ + +
+ +

Check if the process is confirm.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    The boolean value indicating if the process is confirm.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
736
+737
+738
+739
+740
+741
+742
+743
+744
def is_confirm(self) -> bool:
+    """
+    Check if the process is confirm.
+    :return: The boolean value indicating if the process is confirm.
+    """
+
+    self.agent.status = self.status
+
+    return self.status == self._agent_status_manager.CONFIRM.value
+
+
+
+ +
+ +
+ + +

+ is_error() + +

+ + +
+ +

Check if the process is in error.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    The boolean value indicating if the process is in error.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
704
+705
+706
+707
+708
+709
+710
+711
def is_error(self) -> bool:
+    """
+    Check if the process is in error.
+    :return: The boolean value indicating if the process is in error.
+    """
+
+    self.agent.status = self.status
+    return self.status == self._agent_status_manager.ERROR.value
+
+
+
+ +
+ +
+ + +

+ is_paused() + +

+ + +
+ +

Check if the process is paused.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    The boolean value indicating if the process is paused.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
713
+714
+715
+716
+717
+718
+719
+720
+721
+722
+723
+724
def is_paused(self) -> bool:
+    """
+    Check if the process is paused.
+    :return: The boolean value indicating if the process is paused.
+    """
+
+    self.agent.status = self.status
+
+    return (
+        self.status == self._agent_status_manager.PENDING.value
+        or self.status == self._agent_status_manager.CONFIRM.value
+    )
+
+
+
+ +
+ +
+ + +

+ is_pending() + +

+ + +
+ +

Check if the process is pending.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    The boolean value indicating if the process is pending.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
726
+727
+728
+729
+730
+731
+732
+733
+734
def is_pending(self) -> bool:
+    """
+    Check if the process is pending.
+    :return: The boolean value indicating if the process is pending.
+    """
+
+    self.agent.status = self.status
+
+    return self.status == self._agent_status_manager.PENDING.value
+
+
+
+ +
+ +
+ + +

+ log(response_json) + +

+ + +
+ +

Set the result of the session, and log the result. +result: The result of the session. +response_json: The response json. +return: The response json.

+ +
+ Source code in agents/processors/basic.py +
746
+747
+748
+749
+750
+751
+752
+753
+754
def log(self, response_json: Dict[str, Any]) -> None:
+    """
+    Set the result of the session, and log the result.
+    result: The result of the session.
+    response_json: The response json.
+    return: The response json.
+    """
+
+    self.logger.info(json.dumps(response_json))
+
+
+
+ +
+ +
+ + +

+ log_save() + +

+ + +
+ +

Save the log.

+ +
+ Source code in agents/processors/basic.py +
304
+305
+306
+307
+308
+309
+310
+311
+312
def log_save(self) -> None:
+    """
+    Save the log.
+    """
+
+    self._memory_data.add_values_from_dict(
+        {"total_time_cost": self._total_time_cost}
+    )
+    self.log(self._memory_data.to_dict())
+
+
+
+ +
+ +
+ + +

+ method_timer(func) + + + classmethod + + +

+ + +
+ +

Decorator to calculate the time cost of the method.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + func + – +
    +

    The method to be decorated.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The decorated method.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
@classmethod
+def method_timer(cls, func):
+    """
+    Decorator to calculate the time cost of the method.
+    :param func: The method to be decorated.
+    :return: The decorated method.
+    """
+
+    @wraps(func)
+    def wrapper(self, *args, **kwargs):
+        start_time = time.time()
+        result = func(self, *args, **kwargs)
+        end_time = time.time()
+        self._time_cost[func.__name__] = end_time - start_time
+        return result
+
+    return wrapper
+
+
+
+ +
+ +
+ + +

+ parse_response() + + + abstractmethod + + +

+ + +
+ +

Parse the response.

+ +
+ Source code in agents/processors/basic.py +
263
+264
+265
+266
+267
+268
@abstractmethod
+def parse_response(self) -> None:
+    """
+    Parse the response.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ print_step_info() + + + abstractmethod + + +

+ + +
+ +

Print the step information.

+ +
+ Source code in agents/processors/basic.py +
228
+229
+230
+231
+232
+233
@abstractmethod
+def print_step_info(self) -> None:
+    """
+    Print the step information.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ process() + +

+ + +
+ +

Process a single step in a round. +The process includes the following steps: +1. Print the step information. +2. Capture the screenshot. +3. Get the control information. +4. Get the prompt message. +5. Get the response. +6. Update the cost. +7. Parse the response. +8. Execute the action. +9. Update the memory. +10. Update the step and status. +11. Save the log.

+ +
+ Source code in agents/processors/basic.py +
 73
+ 74
+ 75
+ 76
+ 77
+ 78
+ 79
+ 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
def process(self) -> None:
+    """
+    Process a single step in a round.
+    The process includes the following steps:
+    1. Print the step information.
+    2. Capture the screenshot.
+    3. Get the control information.
+    4. Get the prompt message.
+    5. Get the response.
+    6. Update the cost.
+    7. Parse the response.
+    8. Execute the action.
+    9. Update the memory.
+    10. Update the step and status.
+    11. Save the log.
+    """
+
+    start_time = time.time()
+
+    try:
+        # Step 1: Print the step information.
+        self.print_step_info()
+
+        # Step 2: Capture the screenshot.
+        self.capture_screenshot()
+
+        # Step 3: Get the control information.
+        self.get_control_info()
+
+        # Step 4: Get the prompt message.
+        self.get_prompt_message()
+
+        # Step 5: Get the response.
+        self.get_response()
+
+        # Step 6: Update the context.
+        self.update_cost()
+
+        # Step 7: Parse the response, if there is no error.
+        self.parse_response()
+
+        if self.is_pending() or self.is_paused():
+            # If the session is pending, update the step and memory, and return.
+            if self.is_pending():
+                self.update_status()
+                self.update_memory()
+
+            return
+
+        # Step 8: Execute the action.
+        self.execute_action()
+
+        # Step 9: Update the memory.
+        self.update_memory()
+
+        # Step 10: Update the status.
+        self.update_status()
+
+        self._total_time_cost = time.time() - start_time
+
+        # Step 11: Save the log.
+        self.log_save()
+
+    except StopIteration:
+        # Error was handled and logged in the exception capture decorator.
+        # Simply return here to stop the process early.
+
+        return
+
+
+
+ +
+ +
+ + +

+ resume() + +

+ + +
+ +

Resume the process of action execution after the session is paused.

+ +
+ Source code in agents/processors/basic.py +
142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
def resume(self) -> None:
+    """
+    Resume the process of action execution after the session is paused.
+    """
+
+    self._is_resumed = True
+
+    try:
+        # Step 1: Execute the action.
+        self.execute_action()
+
+        # Step 2: Update the memory.
+        self.update_memory()
+
+        # Step 3: Update the status.
+        self.update_status()
+
+    except StopIteration:
+        # Error was handled and logged in the exception capture decorator.
+        # Simply return here to stop the process early.
+        pass
+
+    finally:
+        self._is_resumed = False
+
+
+
+ +
+ +
+ + +

+ string2list(string) + + + staticmethod + + +

+ + +
+ +

Convert a string to a list of string if the input is a string.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + string + (Any) + – +
    +

    The string.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The list.

    +
    +
  • +
+
+
+ Source code in agents/processors/basic.py +
764
+765
+766
+767
+768
+769
+770
+771
+772
+773
+774
@staticmethod
+def string2list(string: Any) -> List[str]:
+    """
+    Convert a string to a list of string if the input is a string.
+    :param string: The string.
+    :return: The list.
+    """
+    if isinstance(string, str):
+        return [string]
+    else:
+        return string
+
+
+
+ +
+ +
+ + +

+ sync_memory() + + + abstractmethod + + +

+ + +
+ +

Sync the memory of the Agent.

+ +
+ Source code in agents/processors/basic.py +
221
+222
+223
+224
+225
+226
@abstractmethod
+def sync_memory(self) -> None:
+    """
+    Sync the memory of the Agent.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ update_cost() + +

+ + +
+ +

Update the cost.

+ +
+ Source code in agents/processors/basic.py +
322
+323
+324
+325
+326
+327
+328
def update_cost(self) -> None:
+    """
+    Update the cost.
+    """
+
+    self.round_cost += self.cost
+    self.session_cost += self.cost
+
+
+
+ +
+ +
+ + +

+ update_memory() + + + abstractmethod + + +

+ + +
+ +

Update the memory of the Agent.

+ +
+ Source code in agents/processors/basic.py +
277
+278
+279
+280
+281
+282
@abstractmethod
+def update_memory(self) -> None:
+    """
+    Update the memory of the Agent.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ update_status() + +

+ + +
+ +

Update the status of the session.

+ +
+ Source code in agents/processors/basic.py +
284
+285
+286
+287
+288
+289
+290
+291
+292
+293
+294
+295
def update_status(self) -> None:
+    """
+    Update the status of the session.
+    """
+    self.agent.step += 1
+    self.agent.status = self.status
+
+    if self.status != self._agent_status_manager.FINISH.value:
+        time.sleep(configs["SLEEP_TIME"])
+
+    self.round_step += 1
+    self.session_step += 1
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/design/prompter/index.html b/agents/design/prompter/index.html new file mode 100644 index 00000000..fca2e5c7 --- /dev/null +++ b/agents/design/prompter/index.html @@ -0,0 +1,997 @@ + + + + + + + + Prompter - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Agent Prompter

+

The Prompter is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. The Prompter is implemented in the ufo/prompts folder. Each agent has its own Prompter class that defines the structure of the prompt and the information to be fed to the LLM.

+

Components

+

A prompt fed to the LLM usually a list of dictionaries, where each dictionary contains the following keys:

+ + + + + + + + + + + + + + + + + +
KeyDescription
roleThe role of the text in the prompt, can be system, user, or assistant.
contentThe content of the text for the specific role.
+
+

Tip

+

You may find the official documentation helpful for constructing the prompt.

+
+

In the __init__ method of the Prompter class, you can define the template of the prompt for each component, and the final prompt message is constructed by combining the templates of each component using the prompt_construction method.

+

System Prompt

+

The system prompt use the template configured in the config_dev.yaml file for each agent. It usually contains the instructions for the agent's role, action, tips, reponse format, etc. +You need use the system_prompt_construction method to construct the system prompt.

+

Prompts on the API instructions, and demonstration examples are also included in the system prompt, which are constructed by the api_prompt_helper and examples_prompt_helper methods respectively. Below is the sub-components of the system prompt:

+ + + + + + + + + + + + + + + + + + + + +
ComponentDescriptionMethod
apisThe API instructions for the agent.api_prompt_helper
examplesThe demonstration examples for the agent.examples_prompt_helper
+

User Prompt

+

The user prompt is constructed based on the information from the agent's observation, external knowledge, and Blackboard. You can use the user_prompt_construction method to construct the user prompt. Below is the sub-components of the user prompt:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
ComponentDescriptionMethod
observationThe observation of the agent.user_content_construction
retrieved_docsThe knowledge retrieved from the external knowledge base.retrived_documents_prompt_helper
blackboardThe information stored in the Blackboard.blackboard_to_prompt
+

Reference

+

You can find the implementation of the Prompter in the ufo/prompts folder. Below is the basic structure of the Prompter class:

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

The BasicPrompter class is the abstract class for the prompter.

+ +

Initialize the BasicPrompter.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    Whether the request is for visual model.

    +
    +
  • +
  • + prompt_template + (str) + – +
    +

    The path of the prompt template.

    +
    +
  • +
  • + example_prompt_template + (str) + – +
    +

    The path of the example prompt template.

    +
    +
  • +
+
+ + + + + +
+ Source code in prompter/basic.py +
18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
def __init__(
+    self, is_visual: bool, prompt_template: str, example_prompt_template: str
+):
+    """
+    Initialize the BasicPrompter.
+    :param is_visual: Whether the request is for visual model.
+    :param prompt_template: The path of the prompt template.
+    :param example_prompt_template: The path of the example prompt template.
+    """
+    self.is_visual = is_visual
+    if prompt_template:
+        self.prompt_template = self.load_prompt_template(prompt_template, is_visual)
+    else:
+        self.prompt_template = ""
+    if example_prompt_template:
+        self.example_prompt_template = self.load_prompt_template(
+            example_prompt_template, is_visual
+        )
+    else:
+        self.example_prompt_template = ""
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ api_prompt_helper() + +

+ + +
+ +

A helper function to construct the API list and descriptions for the prompt.

+ +
+ Source code in prompter/basic.py +
139
+140
+141
+142
+143
+144
def api_prompt_helper(self) -> str:
+    """
+    A helper function to construct the API list and descriptions for the prompt.
+    """
+
+    pass
+
+
+
+ +
+ +
+ + +

+ examples_prompt_helper() + +

+ + +
+ +

A helper function to construct the examples prompt for in-context learning.

+ +
+ Source code in prompter/basic.py +
132
+133
+134
+135
+136
+137
def examples_prompt_helper(self) -> str:
+    """
+    A helper function to construct the examples prompt for in-context learning.
+    """
+
+    pass
+
+
+
+ +
+ +
+ + +

+ load_prompt_template(template_path, is_visual=None) + + + staticmethod + + +

+ + +
+ +

Load the prompt template.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, str] + – +
    +

    The prompt template.

    +
    +
  • +
+
+
+ Source code in prompter/basic.py +
39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
@staticmethod
+def load_prompt_template(template_path: str, is_visual=None) -> Dict[str, str]:
+    """
+    Load the prompt template.
+    :return: The prompt template.
+    """
+
+    if is_visual == None:
+        path = template_path
+    else:
+        path = template_path.format(
+            mode="visual" if is_visual == True else "nonvisual"
+        )
+
+    if not path:
+        return {}
+
+    if os.path.exists(path):
+        try:
+            prompt = yaml.safe_load(open(path, "r", encoding="utf-8"))
+        except yaml.YAMLError as exc:
+            print_with_color(f"Error loading prompt template: {exc}", "yellow")
+    else:
+        raise FileNotFoundError(f"Prompt template not found at {path}")
+
+    return prompt
+
+
+
+ +
+ +
+ + +

+ prompt_construction(system_prompt, user_content) + + + staticmethod + + +

+ + +
+ +

Construct the prompt for summarizing the experience into an example.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + user_content + (List[Dict[str, str]]) + – +
    +

    The user content. return: The prompt for summarizing the experience into an example.

    +
    +
  • +
+
+
+ Source code in prompter/basic.py +
66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
@staticmethod
+def prompt_construction(
+    system_prompt: str, user_content: List[Dict[str, str]]
+) -> List:
+    """
+    Construct the prompt for summarizing the experience into an example.
+    :param user_content: The user content.
+    return: The prompt for summarizing the experience into an example.
+    """
+
+    system_message = {"role": "system", "content": system_prompt}
+
+    user_message = {"role": "user", "content": user_content}
+
+    prompt_message = [system_message, user_message]
+
+    return prompt_message
+
+
+
+ +
+ +
+ + +

+ retrived_documents_prompt_helper(header, separator, documents) + + + staticmethod + + +

+ + +
+ +

Construct the prompt for retrieved documents.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + header + (str) + – +
    +

    The header of the prompt.

    +
    +
  • +
  • + separator + (str) + – +
    +

    The separator of the prompt.

    +
    +
  • +
  • + documents + (List[str]) + – +
    +

    The retrieved documents. return: The prompt for retrieved documents.

    +
    +
  • +
+
+
+ Source code in prompter/basic.py +
 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
@staticmethod
+def retrived_documents_prompt_helper(
+    header: str, separator: str, documents: List[str]
+) -> str:
+    """
+    Construct the prompt for retrieved documents.
+    :param header: The header of the prompt.
+    :param separator: The separator of the prompt.
+    :param documents: The retrieved documents.
+    return: The prompt for retrieved documents.
+    """
+
+    if header:
+        prompt = "\n<{header}:>\n".format(header=header)
+    else:
+        prompt = ""
+    for i, document in enumerate(documents):
+        if separator:
+            prompt += "[{separator} {i}:]".format(separator=separator, i=i + 1)
+            prompt += "\n"
+        prompt += document
+        prompt += "\n\n"
+    return prompt
+
+
+
+ +
+ +
+ + +

+ system_prompt_construction() + + + abstractmethod + + +

+ + +
+ +

Construct the system prompt for LLM.

+ +
+ Source code in prompter/basic.py +
108
+109
+110
+111
+112
+113
+114
@abstractmethod
+def system_prompt_construction(self) -> str:
+    """
+    Construct the system prompt for LLM.
+    """
+
+    pass
+
+
+
+ +
+ +
+ + +

+ user_content_construction() + + + abstractmethod + + +

+ + +
+ +

Construct the full user content for LLM, including the user prompt and images.

+ +
+ Source code in prompter/basic.py +
124
+125
+126
+127
+128
+129
+130
@abstractmethod
+def user_content_construction(self) -> str:
+    """
+    Construct the full user content for LLM, including the user prompt and images.
+    """
+
+    pass
+
+
+
+ +
+ +
+ + +

+ user_prompt_construction() + + + abstractmethod + + +

+ + +
+ +

Construct the textual user prompt for LLM based on the user field in the prompt template.

+ +
+ Source code in prompter/basic.py +
116
+117
+118
+119
+120
+121
+122
@abstractmethod
+def user_prompt_construction(self) -> str:
+    """
+    Construct the textual user prompt for LLM based on the `user` field in the prompt template.
+    """
+
+    pass
+
+
+
+ +
+ + + +
+ +
+ +
+

Tip

+

You can customize the Prompter class to tailor the prompt to your requirements.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/design/state/index.html b/agents/design/state/index.html new file mode 100644 index 00000000..ea65f540 --- /dev/null +++ b/agents/design/state/index.html @@ -0,0 +1,977 @@ + + + + + + + + State - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + + +
  • +
  • +
+
+
+
+
+ +

Agent State

+

The State class is a fundamental component of the UFO agent framework. It represents the current state of the agent and determines the next action and agent to handle the request. Each agent has a specific set of states that define the agent's behavior and workflow.

+

AgentStatus

+

The set of states for an agent is defined in the AgentStatus class:

+
class AgentStatus(Enum):
+    """
+    The status class for the agent.
+    """
+
+    ERROR = "ERROR"
+    FINISH = "FINISH"
+    CONTINUE = "CONTINUE"
+    FAIL = "FAIL"
+    PENDING = "PENDING"
+    CONFIRM = "CONFIRM"
+    SCREENSHOT = "SCREENSHOT"
+
+

Each agent implements its own set of AgentStatus to define the states of the agent.

+

AgentStateManager

+

The class AgentStateManager manages the state mapping from a string to the corresponding state class. Each state class is registered with the AgentStateManager using the register decorator to associate the state class with a specific agent, e.g.,

+
@AgentStateManager.register
+class SomeAgentState(AgentState):
+    """
+    The state class for the some agent.
+    """
+
+
+

Tip

+

You can find examples on how to register the state class for the AppAgent in the ufo/agents/states/app_agent_state.py file.

+
+

Below is the basic structure of the AgentStateManager class:

+
class AgentStateManager(ABC, metaclass=SingletonABCMeta):
+    """
+    A abstract class to manage the states of the agent.
+    """
+
+    _state_mapping: Dict[str, Type[AgentState]] = {}
+
+    def __init__(self):
+        """
+        Initialize the state manager.
+        """
+
+        self._state_instance_mapping: Dict[str, AgentState] = {}
+
+    def get_state(self, status: str) -> AgentState:
+        """
+        Get the state for the status.
+        :param status: The status string.
+        :return: The state object.
+        """
+
+        # Lazy load the state class
+        if status not in self._state_instance_mapping:
+            state_class = self._state_mapping.get(status)
+            if state_class:
+                self._state_instance_mapping[status] = state_class()
+            else:
+                self._state_instance_mapping[status] = self.none_state
+
+        state = self._state_instance_mapping.get(status, self.none_state)
+
+        return state
+
+    def add_state(self, status: str, state: AgentState) -> None:
+        """
+        Add a new state to the state mapping.
+        :param status: The status string.
+        :param state: The state object.
+        """
+        self.state_map[status] = state
+
+    @property
+    def state_map(self) -> Dict[str, AgentState]:
+        """
+        The state mapping of status to state.
+        :return: The state mapping.
+        """
+        return self._state_instance_mapping
+
+    @classmethod
+    def register(cls, state_class: Type[AgentState]) -> Type[AgentState]:
+        """
+        Decorator to register the state class to the state manager.
+        :param state_class: The state class to be registered.
+        :return: The state class.
+        """
+        cls._state_mapping[state_class.name()] = state_class
+        return state_class
+
+    @property
+    @abstractmethod
+    def none_state(self) -> AgentState:
+        """
+        The none state of the state manager.
+        """
+        pass
+
+

AgentState

+

Each state class inherits from the AgentState class and must implement the method of handle to process the action in the state. In addition, the next_state and next_agent methods are used to determine the next state and agent to handle the transition. Please find below the reference for the State class in UFO.

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

The abstract class for the agent state.

+ + + + + + + + + +
+ + + + + + + + + +
+ + +

+ agent_class() + + + abstractmethod + classmethod + + +

+ + +
+ +

The class of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Type[BasicAgent] + – +
    +

    The class of the agent.

    +
    +
  • +
+
+
+ Source code in agents/states/basic.py +
165
+166
+167
+168
+169
+170
+171
+172
@classmethod
+@abstractmethod
+def agent_class(cls) -> Type[BasicAgent]:
+    """
+    The class of the agent.
+    :return: The class of the agent.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ handle(agent, context=None) + + + abstractmethod + + +

+ + +
+ +

Handle the agent for the current step.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + agent + (BasicAgent) + – +
    +

    The agent to handle.

    +
    +
  • +
  • + context + (Optional['Context'], default: + None +) + – +
    +

    The context for the agent and session.

    +
    +
  • +
+
+
+ Source code in agents/states/basic.py +
122
+123
+124
+125
+126
+127
+128
+129
@abstractmethod
+def handle(self, agent: BasicAgent, context: Optional["Context"] = None) -> None:
+    """
+    Handle the agent for the current step.
+    :param agent: The agent to handle.
+    :param context: The context for the agent and session.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ is_round_end() + + + abstractmethod + + +

+ + +
+ +

Check if the round ends.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    True if the round ends, False otherwise.

    +
    +
  • +
+
+
+ Source code in agents/states/basic.py +
149
+150
+151
+152
+153
+154
+155
@abstractmethod
+def is_round_end(self) -> bool:
+    """
+    Check if the round ends.
+    :return: True if the round ends, False otherwise.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ is_subtask_end() + + + abstractmethod + + +

+ + +
+ +

Check if the subtask ends.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + bool + – +
    +

    True if the subtask ends, False otherwise.

    +
    +
  • +
+
+
+ Source code in agents/states/basic.py +
157
+158
+159
+160
+161
+162
+163
@abstractmethod
+def is_subtask_end(self) -> bool:
+    """
+    Check if the subtask ends.
+    :return: True if the subtask ends, False otherwise.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ name() + + + abstractmethod + classmethod + + +

+ + +
+ +

The class name of the state.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The class name of the state.

    +
    +
  • +
+
+
+ Source code in agents/states/basic.py +
174
+175
+176
+177
+178
+179
+180
+181
@classmethod
+@abstractmethod
+def name(cls) -> str:
+    """
+    The class name of the state.
+    :return: The class name of the state.
+    """
+    return ""
+
+
+
+ +
+ +
+ + +

+ next_agent(agent) + + + abstractmethod + + +

+ + +
+ +

Get the agent for the next step.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + agent + (BasicAgent) + – +
    +

    The agent for the current step.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + BasicAgent + – +
    +

    The agent for the next step.

    +
    +
  • +
+
+
+ Source code in agents/states/basic.py +
131
+132
+133
+134
+135
+136
+137
+138
@abstractmethod
+def next_agent(self, agent: BasicAgent) -> BasicAgent:
+    """
+    Get the agent for the next step.
+    :param agent: The agent for the current step.
+    :return: The agent for the next step.
+    """
+    return agent
+
+
+
+ +
+ +
+ + +

+ next_state(agent) + + + abstractmethod + + +

+ + +
+ +

Get the state for the next step.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + agent + (BasicAgent) + – +
    +

    The agent for the current step.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + AgentState + – +
    +

    The state for the next step.

    +
    +
  • +
+
+
+ Source code in agents/states/basic.py +
140
+141
+142
+143
+144
+145
+146
+147
@abstractmethod
+def next_state(self, agent: BasicAgent) -> AgentState:
+    """
+    Get the state for the next step.
+    :param agent: The agent for the current step.
+    :return: The state for the next step.
+    """
+    pass
+
+
+
+ +
+ + + +
+ +
+ +
+

Tip

+

The state machine diagrams for the HostAgent and AppAgent are shown in their respective documents.

+
+
+

Tip

+

A Round calls the handle, next_state, and next_agent methods of the current state to process the user request and determine the next state and agent to handle the request, and orchestrates the agents to execute the necessary actions.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/evaluation_agent/index.html b/agents/evaluation_agent/index.html new file mode 100644 index 00000000..e9cdc3eb --- /dev/null +++ b/agents/evaluation_agent/index.html @@ -0,0 +1,986 @@ + + + + + + + + EvaluationAgent - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

EvaluationAgent 🧐

+

The objective of the EvaluationAgent is to evaluate whether a Session or Round has been successfully completed. The EvaluationAgent assesses the performance of the HostAgent and AppAgent in fulfilling the request. You can configure whether to enable the EvaluationAgent in the config_dev.yaml file and the detailed documentation can be found here.

+
+

Note

+

The EvaluationAgent is fully LLM-driven and conducts evaluations based on the action trajectories and screenshots. It may not by 100% accurate since LLM may make mistakes.

+
+

Configuration

+

To enable the EvaluationAgent, you can configure the following parameters in the config_dev.yaml file to evaluate the task completion status at different levels:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
EVA_SESSIONWhether to include the session in the evaluation.BooleanTrue
EVA_ROUNDWhether to include the round in the evaluation.BooleanFalse
EVA_ALL_SCREENSHOTSWhether to include all the screenshots in the evaluation.BooleanTrue
+

Evaluation Inputs

+

The EvaluationAgent takes the following inputs for evaluation:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
InputDescriptionType
User RequestThe user's request to be evaluated.String
APIs DescriptionThe description of the APIs used in the execution.List of Strings
Action TrajectoriesThe action trajectories executed by the HostAgent and AppAgent.List of Strings
ScreenshotsThe screenshots captured during the execution.List of Images
+

For more details on how to construct the inputs, please refer to the EvaluationAgentPrompter class in ufo/prompter/eva_prompter.py.

+
+

Tip

+

You can configure whether to use all screenshots or only the first and last screenshot for evaluation in the EVA_ALL_SCREENSHOTS of the config_dev.yaml file.

+
+

Evaluation Outputs

+

The EvaluationAgent generates the following outputs after evaluation:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
OutputDescriptionType
reasonThe detailed reason for your judgment, by observing the screenshot differences and the .String
sub_scoresThe sub-score of the evaluation in decomposing the evaluation into multiple sub-goals.List of Dictionaries
completeThe completion status of the evaluation, can be yes, no, or unsure.String
+

Below is an example of the evaluation output:

+
{
+    "reason": "The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams. 
+    The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open. 
+    The agent then focused on the chat window, input the message 'hello', and clicked the Send button. 
+    The final screenshot confirms that the message 'hello' was sent to Zac.", 
+    "sub_scores": {
+        "correct application focus": "yes", 
+        "correct message input": "yes", 
+        "message sent successfully": "yes"
+        }, 
+    "complete": "yes"}
+
+
+

Info

+

The log of the evaluation results will be saved in the logs/{task_name}/evaluation.log file.

+
+

The EvaluationAgent employs the CoT mechanism to first decompose the evaluation into multiple sub-goals and then evaluate each sub-goal separately. The sub-scores are then aggregated to determine the overall completion status of the evaluation.

+

Reference

+ + +
+ + + + +
+

+ Bases: BasicAgent

+ + +

The agent for evaluation.

+ +

Initialize the FollowAgent. +:agent_type: The type of the agent. +:is_visual: The flag indicating whether the agent is visual or not.

+ + + + + + +
+ Source code in agents/agent/evaluation_agent.py +
27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
def __init__(
+    self,
+    name: str,
+    app_root_name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+):
+    """
+    Initialize the FollowAgent.
+    :agent_type: The type of the agent.
+    :is_visual: The flag indicating whether the agent is visual or not.
+    """
+
+    super().__init__(name=name)
+
+    self._app_root_name = app_root_name
+    self.prompter = self.get_prompter(
+        is_visual,
+        main_prompt,
+        example_prompt,
+        api_prompt,
+        app_root_name,
+    )
+
+
+ + + +
+ + + + + + + +
+ + + +

+ status_manager: EvaluatonAgentStatus + + + property + + +

+ + +
+ +

Get the status manager.

+
+ +
+ + + +
+ + +

+ evaluate(request, log_path, eva_all_screenshots=True) + +

+ + +
+ +

Evaluate the task completion.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + log_path + (str) + – +
    +

    The path to the log file.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Tuple[Dict[str, str], float] + – +
    +

    The evaluation result and the cost of LLM.

    +
    +
  • +
+
+
+ Source code in agents/agent/evaluation_agent.py +
104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
def evaluate(
+    self, request: str, log_path: str, eva_all_screenshots: bool = True
+) -> Tuple[Dict[str, str], float]:
+    """
+    Evaluate the task completion.
+    :param log_path: The path to the log file.
+    :return: The evaluation result and the cost of LLM.
+    """
+
+    message = self.message_constructor(
+        log_path=log_path, request=request, eva_all_screenshots=eva_all_screenshots
+    )
+    result, cost = self.get_response(
+        message=message, namescope="app", use_backup_engine=True
+    )
+
+    result = json_parser(result)
+
+    return result, cost
+
+
+
+ +
+ +
+ + +

+ get_prompter(is_visual, prompt_template, example_prompt_template, api_prompt_template, root_name=None) + +

+ + +
+ +

Get the prompter for the agent.

+ +
+ Source code in agents/agent/evaluation_agent.py +
53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
def get_prompter(
+    self,
+    is_visual,
+    prompt_template: str,
+    example_prompt_template: str,
+    api_prompt_template: str,
+    root_name: Optional[str] = None,
+) -> EvaluationAgentPrompter:
+    """
+    Get the prompter for the agent.
+    """
+
+    return EvaluationAgentPrompter(
+        is_visual=is_visual,
+        prompt_template=prompt_template,
+        example_prompt_template=example_prompt_template,
+        api_prompt_template=api_prompt_template,
+        root_name=root_name,
+    )
+
+
+
+ +
+ +
+ + +

+ message_constructor(log_path, request, eva_all_screenshots=True) + +

+ + +
+ +

Construct the message.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + log_path + (str) + – +
    +

    The path to the log file.

    +
    +
  • +
  • + request + (str) + – +
    +

    The request.

    +
    +
  • +
  • + eva_all_screenshots + (bool, default: + True +) + – +
    +

    The flag indicating whether to evaluate all screenshots.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, Any] + – +
    +

    The message.

    +
    +
  • +
+
+
+ Source code in agents/agent/evaluation_agent.py +
73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+92
+93
+94
def message_constructor(
+    self, log_path: str, request: str, eva_all_screenshots: bool = True
+) -> Dict[str, Any]:
+    """
+    Construct the message.
+    :param log_path: The path to the log file.
+    :param request: The request.
+    :param eva_all_screenshots: The flag indicating whether to evaluate all screenshots.
+    :return: The message.
+    """
+
+    evaagent_prompt_system_message = self.prompter.system_prompt_construction()
+
+    evaagent_prompt_user_message = self.prompter.user_content_construction(
+        log_path=log_path, request=request, eva_all_screenshots=eva_all_screenshots
+    )
+
+    evaagent_prompt_message = self.prompter.prompt_construction(
+        evaagent_prompt_system_message, evaagent_prompt_user_message
+    )
+
+    return evaagent_prompt_message
+
+
+
+ +
+ +
+ + +

+ print_response(response_dict) + +

+ + +
+ +

Print the response of the evaluation.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + response_dict + (Dict[str, Any]) + – +
    +

    The response dictionary.

    +
    +
  • +
+
+
+ Source code in agents/agent/evaluation_agent.py +
130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
def print_response(self, response_dict: Dict[str, Any]) -> None:
+    """
+    Print the response of the evaluation.
+    :param response_dict: The response dictionary.
+    """
+
+    emoji_map = {
+        "yes": "✅",
+        "no": "❌",
+        "maybe": "❓",
+    }
+
+    complete = emoji_map.get(
+        response_dict.get("complete"), response_dict.get("complete")
+    )
+
+    sub_scores = response_dict.get("sub_scores", {})
+    reason = response_dict.get("reason", "")
+
+    print_with_color(f"Evaluation result🧐:", "magenta")
+    print_with_color(f"[Sub-scores📊:]", "green")
+
+    for score, evaluation in sub_scores.items():
+        print_with_color(
+            f"{score}: {emoji_map.get(evaluation, evaluation)}", "green"
+        )
+
+    print_with_color(
+        "[Task is complete💯:] {complete}".format(complete=complete), "cyan"
+    )
+
+    print_with_color(f"[Reason🤔:] {reason}".format(reason=reason), "blue")
+
+
+
+ +
+ +
+ + +

+ process_comfirmation() + +

+ + +
+ +

Comfirmation, currently do nothing.

+ +
+ Source code in agents/agent/evaluation_agent.py +
124
+125
+126
+127
+128
def process_comfirmation(self) -> None:
+    """
+    Comfirmation, currently do nothing.
+    """
+    pass
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/follower_agent/index.html b/agents/follower_agent/index.html new file mode 100644 index 00000000..4ba2349a --- /dev/null +++ b/agents/follower_agent/index.html @@ -0,0 +1,977 @@ + + + + + + + + FollowerAgent - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Follower Agent 🚶🏽‍♂️

+

The FollowerAgent is inherited from the AppAgent and is responsible for following the user's instructions to perform specific tasks within the application. The FollowerAgent is designed to execute a series of actions based on the user's guidance. It is particularly useful for software testing, when clear instructions are provided to validate the application's behavior.

+

Different from the AppAgent

+

The FollowerAgent shares most of the functionalities with the AppAgent, but it is designed to follow the step-by-step instructions provided by the user, instead of does its own reasoning to determine the next action.

+

Usage

+

The FollowerAgent is available in follower mode. You can find more details in the documentation. It also uses differnt Session and Processor to handle the user's instructions. The step-wise instructions are provided by the user in the in a json file, which is then parsed by the FollowerAgent to execute the actions. An example of the json file is shown below:

+
{
+    "task": "Type in a bold text of 'Test For Fun'",
+    "steps": 
+    [
+        "1.type in 'Test For Fun'",
+        "2.select the text of 'Test For Fun'",
+        "3.click on the bold"
+    ],
+    "object": "draft.docx"
+}
+
+

Reference

+ + +
+ + + + +
+

+ Bases: AppAgent

+ + +

The FollowerAgent class the manager of a FollowedAgent that follows the step-by-step instructions for action execution within an application. +It is a subclass of the AppAgent, which completes the action execution within the application.

+ +

Initialize the FollowAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + name + (str) + – +
    +

    The name of the agent.

    +
    +
  • +
  • + process_name + (str) + – +
    +

    The process name of the app.

    +
    +
  • +
  • + app_root_name + (str) + – +
    +

    The root name of the app.

    +
    +
  • +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt file path.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt file path.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt file path.

    +
    +
  • +
  • + app_info_prompt + (str) + – +
    +

    The app information prompt file path.

    +
    +
  • +
+
+ + + + + +
+ Source code in agents/agent/follower_agent.py +
21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
def __init__(
+    self,
+    name: str,
+    process_name: str,
+    app_root_name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+    app_info_prompt: str,
+):
+    """
+    Initialize the FollowAgent.
+    :param name: The name of the agent.
+    :param process_name: The process name of the app.
+    :param app_root_name: The root name of the app.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt file path.
+    :param example_prompt: The example prompt file path.
+    :param api_prompt: The API prompt file path.
+    :param app_info_prompt: The app information prompt file path.
+    """
+    super().__init__(
+        name=name,
+        process_name=process_name,
+        app_root_name=app_root_name,
+        is_visual=is_visual,
+        main_prompt=main_prompt,
+        example_prompt=example_prompt,
+        api_prompt=api_prompt,
+        skip_prompter=True,
+    )
+
+    self.prompter = self.get_prompter(
+        is_visual,
+        main_prompt,
+        example_prompt,
+        api_prompt,
+        app_info_prompt,
+        app_root_name,
+    )
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_info_prompt, app_root_name='') + +

+ + +
+ +

Get the prompter for the follower agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (str) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt file path.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt file path.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt file path.

    +
    +
  • +
  • + app_info_prompt + (str) + – +
    +

    The app information prompt file path.

    +
    +
  • +
  • + app_root_name + (str, default: + '' +) + – +
    +

    The root name of the app.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + FollowerAgentPrompter + – +
    +

    The prompter instance.

    +
    +
  • +
+
+
+ Source code in agents/agent/follower_agent.py +
63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
def get_prompter(
+    self,
+    is_visual: str,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+    app_info_prompt: str,
+    app_root_name: str = "",
+) -> FollowerAgentPrompter:
+    """
+    Get the prompter for the follower agent.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt file path.
+    :param example_prompt: The example prompt file path.
+    :param api_prompt: The API prompt file path.
+    :param app_info_prompt: The app information prompt file path.
+    :param app_root_name: The root name of the app.
+    :return: The prompter instance.
+    """
+    return FollowerAgentPrompter(
+        is_visual,
+        main_prompt,
+        example_prompt,
+        api_prompt,
+        app_info_prompt,
+        app_root_name,
+    )
+
+
+
+ +
+ +
+ + +

+ message_constructor(dynamic_examples, dynamic_tips, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, host_message, current_state, state_diff, include_last_screenshot) + +

+ + +
+ +

Construct the prompt message for the FollowAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + dynamic_examples + (str) + – +
    +

    The dynamic examples retrieved from the self-demonstration and human demonstration.

    +
    +
  • +
  • + dynamic_tips + (str) + – +
    +

    The dynamic tips retrieved from the self-demonstration and human demonstration.

    +
    +
  • +
  • + dynamic_knowledge + (str) + – +
    +

    The dynamic knowledge retrieved from the self-demonstration and human demonstration.

    +
    +
  • +
  • + image_list + (List[str]) + – +
    +

    The list of screenshot images.

    +
    +
  • +
  • + control_info + (str) + – +
    +

    The control information.

    +
    +
  • +
  • + prev_subtask + (List[str]) + – +
    +

    The previous subtask.

    +
    +
  • +
  • + plan + (List[str]) + – +
    +

    The plan.

    +
    +
  • +
  • + request + (str) + – +
    +

    The request.

    +
    +
  • +
  • + subtask + (str) + – +
    +

    The subtask.

    +
    +
  • +
  • + host_message + (List[str]) + – +
    +

    The host message.

    +
    +
  • +
  • + current_state + (Dict[str, str]) + – +
    +

    The current state of the app.

    +
    +
  • +
  • + state_diff + (Dict[str, str]) + – +
    +

    The state difference between the current state and the previous state.

    +
    +
  • +
  • + include_last_screenshot + (bool) + – +
    +

    The flag indicating whether the last screenshot should be included.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, str]] + – +
    +

    The prompt message.

    +
    +
  • +
+
+
+ Source code in agents/agent/follower_agent.py +
 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
def message_constructor(
+    self,
+    dynamic_examples: str,
+    dynamic_tips: str,
+    dynamic_knowledge: str,
+    image_list: List[str],
+    control_info: str,
+    prev_subtask: List[str],
+    plan: List[str],
+    request: str,
+    subtask: str,
+    host_message: List[str],
+    current_state: Dict[str, str],
+    state_diff: Dict[str, str],
+    include_last_screenshot: bool,
+) -> List[Dict[str, str]]:
+    """
+    Construct the prompt message for the FollowAgent.
+    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
+    :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration.
+    :param dynamic_knowledge: The dynamic knowledge retrieved from the self-demonstration and human demonstration.
+    :param image_list: The list of screenshot images.
+    :param control_info: The control information.
+    :param prev_subtask: The previous subtask.
+    :param plan: The plan.
+    :param request: The request.
+    :param subtask: The subtask.
+    :param host_message: The host message.
+    :param current_state: The current state of the app.
+    :param state_diff: The state difference between the current state and the previous state.
+    :param include_last_screenshot: The flag indicating whether the last screenshot should be included.
+    :return: The prompt message.
+    """
+    followagent_prompt_system_message = self.prompter.system_prompt_construction(
+        dynamic_examples, dynamic_tips
+    )
+    followagent_prompt_user_message = self.prompter.user_content_construction(
+        image_list=image_list,
+        control_item=control_info,
+        prev_subtask=prev_subtask,
+        prev_plan=plan,
+        user_request=request,
+        subtask=subtask,
+        current_application=self._process_name,
+        host_message=host_message,
+        retrieved_docs=dynamic_knowledge,
+        current_state=current_state,
+        state_diff=state_diff,
+        include_last_screenshot=include_last_screenshot,
+    )
+
+    followagent_prompt_message = self.prompter.prompt_construction(
+        followagent_prompt_system_message, followagent_prompt_user_message
+    )
+
+    return followagent_prompt_message
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/host_agent/index.html b/agents/host_agent/index.html new file mode 100644 index 00000000..8c83e1ad --- /dev/null +++ b/agents/host_agent/index.html @@ -0,0 +1,1879 @@ + + + + + + + + HostAgent - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

HostAgent 🤖

+

The HostAgent assumes three primary responsibilities:

+
    +
  1. User Engagement: The HostAgent engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary.
  2. +
  3. AppAgent Management: The HostAgent manages the creation and registration of AppAgents to fulfill the user's request. It also orchestrates the interaction between the AppAgents and the application.
  4. +
  5. Task Management: The HostAgent analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents. It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents to ensure the successful completion of the user's request.
  6. +
  7. Bash Command Execution: The HostAgent can execute bash commands to open applications or execute system commands to support the user's request and the AppAgents' execution.
  8. +
  9. Communication: The HostAgent communicates with the AppAgents to exchange information. It also manages the Blackboard to store and share information among the agents, as shown below:
  10. +
+

+ Blackboard Image +

+ +

The HostAgent activates its Processor to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent for execution. The HostAgent monitors the progress of the AppAgents and ensures the successful completion of the user's request.

+

HostAgent Input

+

The HostAgent receives the following inputs:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
InputDescriptionType
User RequestThe user's request in natural language.String
Application InformationInformation about the existing active applications.List of Strings
Desktop ScreenshotsScreenshots of the desktop to provide context to the HostAgent.Image
Previous Sub-TasksThe previous sub-tasks and their completion status.List of Strings
Previous PlanThe previous plan for the following sub-tasks.List of Strings
BlackboardThe shared memory space for storing and sharing information among the agents.Dictionary
+

By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions.

+

HostAgent Output

+

With the inputs provided, the HostAgent generates the following outputs:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OutputDescriptionType
ObservationThe observation of current desktop screenshots.String
ThoughtThe logical reasoning process of the HostAgent.String
Current Sub-TaskThe current sub-task to be executed by the AppAgent.String
MessageThe message to be sent to the AppAgent for the completion of the sub-task.String
ControlLabelThe index of the selected application to execute the sub-task.String
ControlTextThe name of the selected application to execute the sub-task.String
PlanThe plan for the following sub-tasks after the current sub-task.List of Strings
StatusThe status of the agent, mapped to the AgentState.String
CommentAdditional comments or information provided to the user.String
QuestionsThe questions to be asked to the user for additional information.List of Strings
BashThe bash command to be executed by the HostAgent. It can be used to open applications or execute system commands.String
+

Below is an example of the HostAgent output:

+
{
+    "Observation": "Desktop screenshot",
+    "Thought": "Logical reasoning process",
+    "Current Sub-Task": "Sub-task description",
+    "Message": "Message to AppAgent",
+    "ControlLabel": "Application index",
+    "ControlText": "Application name",
+    "Plan": ["Sub-task 1", "Sub-task 2"],
+    "Status": "AgentState",
+    "Comment": "Additional comments",
+    "Questions": ["Question 1", "Question 2"],
+    "Bash": "Bash command"
+}
+
+
+

Info

+

The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.

+
+

HostAgent State

+

The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StateDescription
CONTINUEThe HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks.
ASSIGNThe HostAgent is assigning the sub-tasks to the AppAgents for execution.
FINISHThe overall task is completed, and the HostAgent is ready to return the results to the user.
ERRORAn error occurred during the processing of the user's request, and the HostAgent is unable to proceed.
FAILThe HostAgent believes the task is unachievable and cannot proceed further.
PENDINGThe HostAgent is waiting for additional information from the user to proceed.
+ + +

The state machine diagram for the HostAgent is shown below:

+

+ +

+ +

The HostAgent transitions between these states based on the user's request, the application information, and the progress of the AppAgents in executing the sub-tasks.

+

Task Decomposition

+

Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:

+

+ Task Decomposition Image +

+ +

Creating and Registering AppAgents

+

When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent, by calling the create_subagent method:

+
def create_subagent(
+        self,
+        agent_type: str,
+        agent_name: str,
+        process_name: str,
+        app_root_name: str,
+        is_visual: bool,
+        main_prompt: str,
+        example_prompt: str,
+        api_prompt: str,
+        *args,
+        **kwargs,
+    ) -> BasicAgent:
+        """
+        Create an SubAgent hosted by the HostAgent.
+        :param agent_type: The type of the agent to create.
+        :param agent_name: The name of the SubAgent.
+        :param process_name: The process name of the app.
+        :param app_root_name: The root name of the app.
+        :param is_visual: The flag indicating whether the agent is visual or not.
+        :param main_prompt: The main prompt file path.
+        :param example_prompt: The example prompt file path.
+        :param api_prompt: The API prompt file path.
+        :return: The created SubAgent.
+        """
+        app_agent = self.agent_factory.create_agent(
+            agent_type,
+            agent_name,
+            process_name,
+            app_root_name,
+            is_visual,
+            main_prompt,
+            example_prompt,
+            api_prompt,
+            *args,
+            **kwargs,
+        )
+        self.appagent_dict[agent_name] = app_agent
+        app_agent.host = self
+        self._active_appagent = app_agent
+
+        return app_agent
+
+

The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress.

+

Reference

+ + +
+ + + + +
+

+ Bases: BasicAgent

+ + +

The HostAgent class the manager of AppAgents.

+ +

Initialize the HostAgent. +:name: The name of the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt file path.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt file path.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt file path.

    +
    +
  • +
+
+ + + + + +
+ Source code in agents/agent/host_agent.py +
51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
def __init__(
+    self,
+    name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+) -> None:
+    """
+    Initialize the HostAgent.
+    :name: The name of the agent.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt file path.
+    :param example_prompt: The example prompt file path.
+    :param api_prompt: The API prompt file path.
+    """
+    super().__init__(name=name)
+    self.prompter = self.get_prompter(
+        is_visual, main_prompt, example_prompt, api_prompt
+    )
+    self.offline_doc_retriever = None
+    self.online_doc_retriever = None
+    self.experience_retriever = None
+    self.human_demonstration_retriever = None
+    self.agent_factory = AgentFactory()
+    self.appagent_dict = {}
+    self._active_appagent = None
+    self._blackboard = Blackboard()
+    self.set_state(ContinueHostAgentState())
+    self.Puppeteer = self.create_puppeteer_interface()
+
+
+ + + +
+ + + + + + + +
+ + + +

+ blackboard + + + property + + +

+ + +
+ +

Get the blackboard.

+
+ +
+ +
+ + + +

+ status_manager: HostAgentStatus + + + property + + +

+ + +
+ +

Get the status manager.

+
+ +
+ +
+ + + +

+ sub_agent_amount: int + + + property + + +

+ + +
+ +

Get the amount of sub agents.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The amount of sub agents.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ create_app_agent(application_window_name, application_root_name, request, mode) + +

+ + +
+ +

Create the app agent for the host agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + application_window_name + (str) + – +
    +

    The name of the application window.

    +
    +
  • +
  • + application_root_name + (str) + – +
    +

    The name of the application root.

    +
    +
  • +
  • + request + (str) + – +
    +

    The user request.

    +
    +
  • +
  • + mode + (str) + – +
    +

    The mode of the session.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + AppAgent + – +
    +

    The app agent.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
220
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
def create_app_agent(
+    self,
+    application_window_name: str,
+    application_root_name: str,
+    request: str,
+    mode: str,
+) -> AppAgent:
+    """
+    Create the app agent for the host agent.
+    :param application_window_name: The name of the application window.
+    :param application_root_name: The name of the application root.
+    :param request: The user request.
+    :param mode: The mode of the session.
+    :return: The app agent.
+    """
+
+    if mode == "normal":
+
+        agent_name = "AppAgent/{root}/{process}".format(
+            root=application_root_name, process=application_window_name
+        )
+
+        app_agent: AppAgent = self.create_subagent(
+            agent_type="app",
+            agent_name=agent_name,
+            process_name=application_window_name,
+            app_root_name=application_root_name,
+            is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
+            main_prompt=configs["APPAGENT_PROMPT"],
+            example_prompt=configs["APPAGENT_EXAMPLE_PROMPT"],
+            api_prompt=configs["API_PROMPT"],
+        )
+
+    elif mode == "follower":
+
+        # Load additional app info prompt.
+        app_info_prompt = configs.get("APP_INFO_PROMPT", None)
+
+        agent_name = "FollowerAgent/{root}/{process}".format(
+            root=application_root_name, process=application_window_name
+        )
+
+        # Create the app agent in the follower mode.
+        app_agent = self.create_subagent(
+            agent_type="follower",
+            agent_name=agent_name,
+            process_name=application_window_name,
+            app_root_name=application_root_name,
+            is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
+            main_prompt=configs["FOLLOWERAHENT_PROMPT"],
+            example_prompt=configs["APPAGENT_EXAMPLE_PROMPT"],
+            api_prompt=configs["API_PROMPT"],
+            app_info_prompt=app_info_prompt,
+        )
+
+    else:
+        raise ValueError(f"The {mode} mode is not supported.")
+
+    # Create the COM receiver for the app agent.
+    if configs.get("USE_APIS", False):
+        app_agent.Puppeteer.receiver_manager.create_api_receiver(
+            application_root_name, application_window_name
+        )
+
+    # Provision the context for the app agent, including the all retrievers.
+    app_agent.context_provision(request)
+
+    return app_agent
+
+
+
+ +
+ +
+ + +

+ create_puppeteer_interface() + +

+ + +
+ +

Create the Puppeteer interface to automate the app.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + AppPuppeteer + – +
    +

    The Puppeteer interface.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
213
+214
+215
+216
+217
+218
def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
+    """
+    Create the Puppeteer interface to automate the app.
+    :return: The Puppeteer interface.
+    """
+    return puppeteer.AppPuppeteer("", "")
+
+
+
+ +
+ +
+ + +

+ create_subagent(agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs) + +

+ + +
+ +

Create an SubAgent hosted by the HostAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + agent_type + (str) + – +
    +

    The type of the agent to create.

    +
    +
  • +
  • + agent_name + (str) + – +
    +

    The name of the SubAgent.

    +
    +
  • +
  • + process_name + (str) + – +
    +

    The process name of the app.

    +
    +
  • +
  • + app_root_name + (str) + – +
    +

    The root name of the app.

    +
    +
  • +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt file path.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt file path.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt file path.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + BasicAgent + – +
    +

    The created SubAgent.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
def create_subagent(
+    self,
+    agent_type: str,
+    agent_name: str,
+    process_name: str,
+    app_root_name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+    *args,
+    **kwargs,
+) -> BasicAgent:
+    """
+    Create an SubAgent hosted by the HostAgent.
+    :param agent_type: The type of the agent to create.
+    :param agent_name: The name of the SubAgent.
+    :param process_name: The process name of the app.
+    :param app_root_name: The root name of the app.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt file path.
+    :param example_prompt: The example prompt file path.
+    :param api_prompt: The API prompt file path.
+    :return: The created SubAgent.
+    """
+    app_agent = self.agent_factory.create_agent(
+        agent_type,
+        agent_name,
+        process_name,
+        app_root_name,
+        is_visual,
+        main_prompt,
+        example_prompt,
+        api_prompt,
+        *args,
+        **kwargs,
+    )
+    self.appagent_dict[agent_name] = app_agent
+    app_agent.host = self
+    self._active_appagent = app_agent
+
+    return app_agent
+
+
+
+ +
+ +
+ + +

+ get_active_appagent() + +

+ + +
+ +

Get the active app agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + AppAgent + – +
    +

    The active app agent.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
150
+151
+152
+153
+154
+155
def get_active_appagent(self) -> AppAgent:
+    """
+    Get the active app agent.
+    :return: The active app agent.
+    """
+    return self._active_appagent
+
+
+
+ +
+ +
+ + +

+ get_prompter(is_visual, main_prompt, example_prompt, api_prompt) + +

+ + +
+ +

Get the prompt for the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt file path.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt file path.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt file path.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + HostAgentPrompter + – +
    +

    The prompter instance.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+92
+93
+94
+95
+96
+97
def get_prompter(
+    self,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+) -> HostAgentPrompter:
+    """
+    Get the prompt for the agent.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt file path.
+    :param example_prompt: The example prompt file path.
+    :param api_prompt: The API prompt file path.
+    :return: The prompter instance.
+    """
+    return HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)
+
+
+
+ +
+ +
+ + +

+ message_constructor(image_list, os_info, plan, prev_subtask, request) + +

+ + +
+ +

Construct the message.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + image_list + (List[str]) + – +
    +

    The list of screenshot images.

    +
    +
  • +
  • + os_info + (str) + – +
    +

    The OS information.

    +
    +
  • +
  • + prev_subtask + (List[Dict[str, str]]) + – +
    +

    The previous subtask.

    +
    +
  • +
  • + plan + (List[str]) + – +
    +

    The plan.

    +
    +
  • +
  • + request + (str) + – +
    +

    The request.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, Union[str, List[Dict[str, str]]]]] + – +
    +

    The message.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
def message_constructor(
+    self,
+    image_list: List[str],
+    os_info: str,
+    plan: List[str],
+    prev_subtask: List[Dict[str, str]],
+    request: str,
+) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
+    """
+    Construct the message.
+    :param image_list: The list of screenshot images.
+    :param os_info: The OS information.
+    :param prev_subtask: The previous subtask.
+    :param plan: The plan.
+    :param request: The request.
+    :return: The message.
+    """
+    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
+    hostagent_prompt_user_message = self.prompter.user_content_construction(
+        image_list=image_list,
+        control_item=os_info,
+        prev_subtask=prev_subtask,
+        prev_plan=plan,
+        user_request=request,
+    )
+
+    if not self.blackboard.is_empty():
+        blackboard_prompt = self.blackboard.blackboard_to_prompt()
+        hostagent_prompt_user_message = (
+            blackboard_prompt + hostagent_prompt_user_message
+        )
+
+    hostagent_prompt_message = self.prompter.prompt_construction(
+        hostagent_prompt_system_message, hostagent_prompt_user_message
+    )
+
+    return hostagent_prompt_message
+
+
+
+ +
+ +
+ + +

+ print_response(response_dict) + +

+ + +
+ +

Print the response.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + response_dict + (Dict) + – +
    +

    The response dictionary to print.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
295
+296
+297
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+317
+318
+319
+320
+321
+322
+323
+324
+325
+326
+327
+328
+329
+330
+331
+332
+333
+334
+335
+336
+337
+338
+339
+340
+341
+342
+343
+344
def print_response(self, response_dict: Dict) -> None:
+    """
+    Print the response.
+    :param response_dict: The response dictionary to print.
+    """
+
+    application = response_dict.get("ControlText")
+    if not application:
+        application = "[The required application needs to be opened.]"
+    observation = response_dict.get("Observation")
+    thought = response_dict.get("Thought")
+    bash_command = response_dict.get("Bash", None)
+    subtask = response_dict.get("CurrentSubtask")
+
+    # Convert the message from a list to a string.
+    message = list(response_dict.get("Message", ""))
+    message = "\n".join(message)
+
+    # Concatenate the subtask with the plan and convert the plan from a list to a string.
+    plan = list(response_dict.get("Plan"))
+    plan = [subtask] + plan
+    plan = "\n".join([f"({i+1}) " + str(item) for i, item in enumerate(plan)])
+
+    status = response_dict.get("Status")
+    comment = response_dict.get("Comment")
+
+    utils.print_with_color(
+        "Observations👀: {observation}".format(observation=observation), "cyan"
+    )
+    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
+    if bash_command:
+        utils.print_with_color(
+            "Running Bash Command🔧: {bash}".format(bash=bash_command), "yellow"
+        )
+    utils.print_with_color(
+        "Plans📚: {plan}".format(plan=plan),
+        "cyan",
+    )
+    utils.print_with_color(
+        "Next Selected application📲: {application}".format(
+            application=application
+        ),
+        "yellow",
+    )
+    utils.print_with_color(
+        "Messages to AppAgent📩: {message}".format(message=message), "cyan"
+    )
+    utils.print_with_color("Status📊: {status}".format(status=status), "blue")
+
+    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")
+
+
+
+ +
+ +
+ + +

+ process(context) + +

+ + +
+ +

Process the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + context + (Context) + – +
    +

    The context.

    +
    +
  • +
+
+
+ Source code in agents/agent/host_agent.py +
202
+203
+204
+205
+206
+207
+208
+209
+210
+211
def process(self, context: Context) -> None:
+    """
+    Process the agent.
+    :param context: The context.
+    """
+    self.processor = HostAgentProcessor(agent=self, context=context)
+    self.processor.process()
+
+    # Sync the status with the processor.
+    self.status = self.processor.status
+
+
+
+ +
+ +
+ + +

+ process_comfirmation() + +

+ + +
+ +

TODO: Process the confirmation.

+ +
+ Source code in agents/agent/host_agent.py +
289
+290
+291
+292
+293
def process_comfirmation(self) -> None:
+    """
+    TODO: Process the confirmation.
+    """
+    pass
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/agents/overview/index.html b/agents/overview/index.html new file mode 100644 index 00000000..a65d8910 --- /dev/null +++ b/agents/overview/index.html @@ -0,0 +1,2042 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Agents

+

In UFO, there are four types of agents: HostAgent, AppAgent, FollowerAgent, and EvaluationAgent. Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
AgentDescription
HostAgentDecomposes the user request into sub-tasks and selects the appropriate application to fulfill the request.
AppAgentExecutes actions on the selected application.
FollowerAgentFollows the user's instructions to complete the task.
EvaluationAgentEvaluates the completeness of a session or a round.
+

In the normal workflow, only the HostAgent and AppAgent are involved in the user interaction process. The FollowerAgent and EvaluationAgent are used for specific tasks.

+

Please see below the orchestration of the agents in UFO:

+

+ +

+ +

Main Components

+

An agent in UFO is composed of the following main components to fulfill its role in the UFO system:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ComponentDescription
StateRepresents the current state of the agent and determines the next action and agent to handle the request.
MemoryStores information about the user request, application state, and other relevant data.
BlackboardStores information shared between agents.
PrompterGenerates prompts for the language model based on the user request and application state.
ProcessorProcesses the workflow of the agent, including handling user requests, executing actions, and memory management.
+

Reference

+

Below is the reference for the Agent class in UFO. All agents in UFO inherit from the Agent class and implement necessary methods to fulfill their roles in the UFO system.

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

The BasicAgent class is the abstract class for the agent.

+ +

Initialize the BasicAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + name + (str) + – +
    +

    The name of the agent.

    +
    +
  • +
+
+ + + + + +
+ Source code in agents/agent/basic.py +
37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
def __init__(self, name: str) -> None:
+    """
+    Initialize the BasicAgent.
+    :param name: The name of the agent.
+    """
+    self._step = 0
+    self._complete = False
+    self._name = name
+    self._status = self.status_manager.CONTINUE.value
+    self._register_self()
+    self.retriever_factory = retriever.RetrieverFactory()
+    self._memory = Memory()
+    self._host = None
+    self._processor: Optional[BaseProcessor] = None
+    self._state = None
+    self.Puppeteer: puppeteer.AppPuppeteer = None
+
+
+ + + +
+ + + + + + + +
+ + + +

+ blackboard: Blackboard + + + property + + +

+ + +
+ +

Get the blackboard.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Blackboard + – +
    +

    The blackboard.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ host: HostAgent + + + property + writable + + +

+ + +
+ +

Get the host of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + HostAgent + – +
    +

    The host of the agent.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ memory: Memory + + + property + + +

+ + +
+ +

Get the memory of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Memory + – +
    +

    The memory of the agent.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ name: str + + + property + + +

+ + +
+ +

Get the name of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The name of the agent.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ processor: BaseProcessor + + + property + writable + + +

+ + +
+ +

Get the processor.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + BaseProcessor + – +
    +

    The processor.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ state: AgentState + + + property + + +

+ + +
+ +

Get the state of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + AgentState + – +
    +

    The state of the agent.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ status: str + + + property + writable + + +

+ + +
+ +

Get the status of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The status of the agent.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ status_manager: AgentStatus + + + property + + +

+ + +
+ +

Get the status manager.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + AgentStatus + – +
    +

    The status manager.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ step: int + + + property + writable + + +

+ + +
+ +

Get the step of the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The step of the agent.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ add_memory(memory_item) + +

+ + +
+ +

Update the memory of the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + memory_item + (MemoryItem) + – +
    +

    The memory item to add.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
181
+182
+183
+184
+185
+186
def add_memory(self, memory_item: MemoryItem) -> None:
+    """
+    Update the memory of the agent.
+    :param memory_item: The memory item to add.
+    """
+    self._memory.add_memory_item(memory_item)
+
+
+
+ +
+ +
+ + +

+ build_experience_retriever() + +

+ + +
+ +

Build the experience retriever.

+ +
+ Source code in agents/agent/basic.py +
323
+324
+325
+326
+327
def build_experience_retriever(self) -> None:
+    """
+    Build the experience retriever.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ build_human_demonstration_retriever() + +

+ + +
+ +

Build the human demonstration retriever.

+ +
+ Source code in agents/agent/basic.py +
329
+330
+331
+332
+333
def build_human_demonstration_retriever(self) -> None:
+    """
+    Build the human demonstration retriever.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ build_offline_docs_retriever() + +

+ + +
+ +

Build the offline docs retriever.

+ +
+ Source code in agents/agent/basic.py +
311
+312
+313
+314
+315
def build_offline_docs_retriever(self) -> None:
+    """
+    Build the offline docs retriever.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ build_online_search_retriever() + +

+ + +
+ +

Build the online search retriever.

+ +
+ Source code in agents/agent/basic.py +
317
+318
+319
+320
+321
def build_online_search_retriever(self) -> None:
+    """
+    Build the online search retriever.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ clear_memory() + +

+ + +
+ +

Clear the memory of the agent.

+ +
+ Source code in agents/agent/basic.py +
195
+196
+197
+198
+199
def clear_memory(self) -> None:
+    """
+    Clear the memory of the agent.
+    """
+    self._memory.clear()
+
+
+
+ +
+ +
+ + +

+ create_puppeteer_interface() + +

+ + +
+ +

Create the puppeteer interface.

+ +
+ Source code in agents/agent/basic.py +
233
+234
+235
+236
+237
def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
+    """
+    Create the puppeteer interface.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ delete_memory(step) + +

+ + +
+ +

Delete the memory of the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + step + (int) + – +
    +

    The step of the memory item to delete.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
188
+189
+190
+191
+192
+193
def delete_memory(self, step: int) -> None:
+    """
+    Delete the memory of the agent.
+    :param step: The step of the memory item to delete.
+    """
+    self._memory.delete_memory_item(step)
+
+
+
+ +
+ +
+ + +

+ get_cls(name) + + + classmethod + + +

+ + +
+ +

Retrieves an agent class from the registry.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + name + (str) + – +
    +

    The name of the agent class.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Type['BasicAgent'] + – +
    +

    The agent class.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
350
+351
+352
+353
+354
+355
+356
+357
@classmethod
+def get_cls(cls, name: str) -> Type["BasicAgent"]:
+    """
+    Retrieves an agent class from the registry.
+    :param name: The name of the agent class.
+    :return: The agent class.
+    """
+    return AgentRegistry().get_cls(name)
+
+
+
+ +
+ +
+ + +

+ get_prompter() + + + abstractmethod + + +

+ + +
+ +

Get the prompt for the agent.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The prompt.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
124
+125
+126
+127
+128
+129
+130
@abstractmethod
+def get_prompter(self) -> str:
+    """
+    Get the prompt for the agent.
+    :return: The prompt.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ get_response(message, namescope, use_backup_engine, configs=configs) + + + classmethod + + +

+ + +
+ +

Get the response for the prompt.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + message + (List[dict]) + – +
    +

    The message for LLMs.

    +
    +
  • +
  • + namescope + (str) + – +
    +

    The namescope for the LLMs.

    +
    +
  • +
  • + use_backup_engine + (bool) + – +
    +

    Whether to use the backup engine.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The response.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
@classmethod
+def get_response(
+    cls, message: List[dict], namescope: str, use_backup_engine: bool, configs = configs
+) -> str:
+    """
+    Get the response for the prompt.
+    :param message: The message for LLMs.
+    :param namescope: The namescope for the LLMs.
+    :param use_backup_engine: Whether to use the backup engine.
+    :return: The response.
+    """
+    response_string, cost = llm_call.get_completion(
+        message, namescope, use_backup_engine=use_backup_engine, configs = configs
+    )
+    return response_string, cost
+
+
+
+ +
+ +
+ + +

+ handle(context) + +

+ + +
+ +

Handle the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + context + (Context) + – +
    +

    The context for the agent.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
220
+221
+222
+223
+224
+225
def handle(self, context: Context) -> None:
+    """
+    Handle the agent.
+    :param context: The context for the agent.
+    """
+    self.state.handle(self, context)
+
+
+
+ +
+ +
+ + +

+ message_constructor() + + + abstractmethod + + +

+ + +
+ +

Construct the message.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, Union[str, List[Dict[str, str]]]]] + – +
    +

    The message.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
132
+133
+134
+135
+136
+137
+138
@abstractmethod
+def message_constructor(self) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
+    """
+    Construct the message.
+    :return: The message.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ print_response() + +

+ + +
+ +

Print the response.

+ +
+ Source code in agents/agent/basic.py +
335
+336
+337
+338
+339
def print_response(self) -> None:
+    """
+    Print the response.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ process(context) + +

+ + +
+ +

Process the agent.

+ +
+ Source code in agents/agent/basic.py +
227
+228
+229
+230
+231
def process(self, context: Context) -> None:
+    """
+    Process the agent.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ process_asker(ask_user=True) + +

+ + +
+ +

Ask for the process.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + ask_user + (bool, default: + True +) + – +
    +

    Whether to ask the user for the questions.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
def process_asker(self, ask_user: bool = True) -> None:
+    """
+    Ask for the process.
+    :param ask_user: Whether to ask the user for the questions.
+    """
+    if self.processor:
+        question_list = self.processor.question_list
+
+        if ask_user:
+            utils.print_with_color(
+                "Could you please answer the following questions to help me understand your needs and complete the task?",
+                "yellow",
+            )
+
+        for index, question in enumerate(question_list):
+            if ask_user:
+                answer = question_asker(question, index + 1)
+                if not answer.strip():
+                    continue
+                qa_pair = {"question": question, "answer": answer}
+
+                utils.append_string_to_file(
+                    configs["QA_PAIR_FILE"], json.dumps(qa_pair)
+                )
+
+            else:
+                qa_pair = {
+                    "question": question,
+                    "answer": "The answer for the question is not available, please proceed with your own knowledge or experience, or leave it as a placeholder. Do not ask the same question again.",
+                }
+
+            self.blackboard.add_questions(qa_pair)
+
+
+
+ +
+ +
+ + +

+ process_comfirmation() + + + abstractmethod + + +

+ + +
+ +

Confirm the process.

+ +
+ Source code in agents/agent/basic.py +
280
+281
+282
+283
+284
+285
@abstractmethod
+def process_comfirmation(self) -> None:
+    """
+    Confirm the process.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ process_resume() + +

+ + +
+ +

Resume the process.

+ +
+ Source code in agents/agent/basic.py +
239
+240
+241
+242
+243
+244
def process_resume(self) -> None:
+    """
+    Resume the process.
+    """
+    if self.processor:
+        self.processor.resume()
+
+
+
+ +
+ +
+ + +

+ reflection() + +

+ + +
+ +

TODO: +Reflect on the action.

+ +
+ Source code in agents/agent/basic.py +
201
+202
+203
+204
+205
+206
def reflection(self) -> None:
+    """
+    TODO:
+    Reflect on the action.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ response_to_dict(response) + + + staticmethod + + +

+ + +
+ +

Convert the response to a dictionary.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + response + (str) + – +
    +

    The response.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, str] + – +
    +

    The dictionary.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
156
+157
+158
+159
+160
+161
+162
+163
@staticmethod
+def response_to_dict(response: str) -> Dict[str, str]:
+    """
+    Convert the response to a dictionary.
+    :param response: The response.
+    :return: The dictionary.
+    """
+    return utils.json_parser(response)
+
+
+
+ +
+ +
+ + +

+ set_state(state) + +

+ + +
+ +

Set the state of the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + state + (AgentState) + – +
    +

    The state of the agent.

    +
    +
  • +
+
+
+ Source code in agents/agent/basic.py +
208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
def set_state(self, state: AgentState) -> None:
+    """
+    Set the state of the agent.
+    :param state: The state of the agent.
+    """
+
+    assert issubclass(
+        type(self), state.agent_class()
+    ), f"The state is only for agent type of {state.agent_class()}, but the current agent is {type(self)}."
+
+    self._state = state
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/assets/_mkdocstrings.css b/assets/_mkdocstrings.css new file mode 100644 index 00000000..8a5d3cff --- /dev/null +++ b/assets/_mkdocstrings.css @@ -0,0 +1,57 @@ + +/* Avoid breaking parameters name, etc. in table cells. */ +.doc-contents td code { + word-break: normal !important; +} + +/* No line break before first paragraph of descriptions. */ +.doc-md-description, +.doc-md-description>p:first-child { + display: inline; +} + +/* Avoid breaking code headings. */ +.doc-heading code { + white-space: normal; +} + +/* Improve rendering of parameters, returns and exceptions. */ +.doc-contents .field-name { + min-width: 100px; +} + +/* Other curious-spacing fixes. */ +.doc-contents .field-name, +.doc-contents .field-body { + border: none !important; + padding: 0 !important; +} + +.doc-contents p { + margin: 1em 0 1em; +} + +.doc-contents .field-list { + margin: 0 !important; +} + +.doc-contents pre { + padding: 0 !important; +} + +.doc-contents .wy-table-responsive { + margin-bottom: 0 !important; +} + +.doc-contents td.code { + padding: 0 !important; +} + +.doc-contents td.linenos { + padding: 0 8px !important; +} + +.doc-children, +footer { + margin-top: 20px; +} \ No newline at end of file diff --git a/automator/ai_tool_automator/index.html b/automator/ai_tool_automator/index.html new file mode 100644 index 00000000..0a33792f --- /dev/null +++ b/automator/ai_tool_automator/index.html @@ -0,0 +1,367 @@ + + + + + + + + AI Tool - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

AI Tool Automator

+

The AI Tool Automator is a component of the UFO framework that enables the agent to interact with AI tools based on large language models (LLMs). The AI Tool Automator is designed to facilitate the integration of LLM-based AI tools into the UFO framework, enabling the agent to leverage the capabilities of these tools to perform complex tasks.

+
+

Note

+

UFO can also call in-app AI tools, such as Copilot, to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. These in-app AI tools differ from the AI Tool Automator, which is designed to interact with external AI tools based on LLMs that are not integrated into the application.

+
+

Configuration

+

The AI Tool Automator shares the same prompt configuration options as the UI Automator:

+ + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
API_PROMPTThe prompt for the UI automation API.String"ufo/prompts/share/base/api.yaml"
+

Receiver

+

The AI Tool Automator shares the same receiver structure as the UI Automator. Please refer to the UI Automator Receiver section for more details.

+

Command

+

The command of the AI Tool Automator shares the same structure as the UI Automator. Please refer to the UI Automator Command section for more details. The list of available commands in the AI Tool Automator is shown below:

+ + + + + + + + + + + + + + + + + + + + +
Command NameFunction NameDescription
AnnotationCommandannotationAnnotate the control items on the screenshot.
SummaryCommandsummarySummarize the observation of the current application window.
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/automator/bash_automator/index.html b/automator/bash_automator/index.html new file mode 100644 index 00000000..2d2a98ce --- /dev/null +++ b/automator/bash_automator/index.html @@ -0,0 +1,506 @@ + + + + + + + + Bash Automator - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Bash Automator

+

UFO allows the HostAgent to execute bash commands on the host machine. The bash commands can be used to open applications or execute system commands. The Bash Automator is implemented in the ufo/automator/app_apis/shell module.

+
+

Note

+

Only HostAgent is currently supported by the Bash Automator.

+
+

Receiver

+

The Web Automator receiver is the ShellReceiver class defined in the ufo/automator/app_apis/shell/shell_client.py file.

+ + +
+ + + + +
+

+ Bases: ReceiverBasic

+ + +

The base class for Web COM client using crawl4ai.

+ +

Initialize the shell client.

+ + + + + + +
+ Source code in automator/app_apis/shell/shell_client.py +
19
+20
+21
+22
def __init__(self) -> None:
+    """
+    Initialize the shell client.
+    """
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ run_shell(params) + +

+ + +
+ +

Run the command.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, Any]) + – +
    +

    The parameters of the command.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Any + – +
    +

    The result content.

    +
    +
  • +
+
+
+ Source code in automator/app_apis/shell/shell_client.py +
24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
def run_shell(self, params: Dict[str, Any]) -> Any:
+    """
+    Run the command.
+    :param params: The parameters of the command.
+    :return: The result content.
+    """
+    bash_command = params.get("command")
+    result = subprocess.run(
+        bash_command, shell=True, capture_output=True, text=True
+    )
+    return result.stdout
+
+
+
+ +
+ + + +
+ +
+ +


+

Command

+

We now only support one command in the Bash Automator to execute a bash command on the host machine.

+
@ShellReceiver.register
+class RunShellCommand(ShellCommand):
+    """
+    The command to run the crawler with various options.
+    """
+
+    def execute(self):
+        """
+        Execute the command to run the crawler.
+        :return: The result content.
+        """
+        return self.receiver.run_shell(params=self.params)
+
+    @classmethod
+    def name(cls) -> str:
+        """
+        The name of the command.
+        """
+        return "run_shell"
+
+

Below is the list of available commands in the Web Automator that are currently supported by UFO:

+ + + + + + + + + + + + + + + +
Command NameFunction NameDescription
RunShellCommandrun_shellGet the content of a web page into a markdown format.
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/automator/overview/index.html b/automator/overview/index.html new file mode 100644 index 00000000..177fb374 --- /dev/null +++ b/automator/overview/index.html @@ -0,0 +1,2328 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Application Automator

+

The Automator application is a tool that allows UFO to automate and take actions on applications. Currently, UFO supports two types of actions: UI Automation and API.

+
+

Note

+

UFO can also call in-app AI tools, such as Copilot, to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool.

+
+
    +
  • UI Automator - This action type is used to interact with the application's UI controls, such as buttons, text boxes, and menus. UFO uses the UIA or Win32 APIs to interact with the application's UI controls.
  • +
  • API - This action type is used to interact with the application's native API. Users and app developers can create their own API actions to interact with specific applications.
  • +
  • Web - This action type is used to interact with web applications. UFO uses the crawl4ai library to extract information from web pages.
  • +
  • Bash - This action type is used to interact with the command line interface (CLI) of an application.
  • +
  • AI Tool - This action type is used to interact with the LLM-based AI tools.
  • +
+

Action Design Patterns

+

Actions in UFO are implemented using the command design pattern, which encapsulates a receiver, a command, and an invoker. The receiver is the object that performs the action, the command is the object that encapsulates the action, and the invoker is the object that triggers the action.

+

The basic classes for implementing actions in UFO are as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
RoleClassDescription
Receiverufo.automator.basic.ReceiverBasicThe base class for all receivers in UFO. Receivers are objects that perform actions on applications.
Commandufo.automator.basic.CommandBasicThe base class for all commands in UFO. Commands are objects that encapsulate actions to be performed by receivers.
Invokerufo.automator.puppeteer.AppPuppeteerThe base class for the invoker in UFO. Invokers are objects that trigger commands to be executed by receivers.
+

The advantage of using the command design pattern in the agent framework is that it allows for the decoupling of the sender and receiver of the action. This decoupling enables the agent to execute actions on different objects without knowing the details of the object or the action being performed, making the agent more flexible and extensible for new actions.

+

Receiver

+

The Receiver is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the ReceiverManager class.

+

You can find the reference for a basic Receiver class below:

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

The abstract receiver interface.

+ + + + + + + + + +
+ + + + + + + +
+ + + +

+ command_registry: Dict[str, Type[CommandBasic]] + + + property + + +

+ + +
+ +

Get the command registry.

+
+ +
+ +
+ + + +

+ supported_command_names: List[str] + + + property + + +

+ + +
+ +

Get the command name list.

+
+ +
+ + + +
+ + +

+ register(command_class) + + + classmethod + + +

+ + +
+ +

Decorator to register the state class to the state manager.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_class + (Type[CommandBasic]) + – +
    +

    The state class to be registered.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: + +
+
+ Source code in automator/basic.py +
46
+47
+48
+49
+50
+51
+52
+53
+54
@classmethod
+def register(cls, command_class: Type[CommandBasic]) -> Type[CommandBasic]:
+    """
+    Decorator to register the state class to the state manager.
+    :param command_class: The state class to be registered.
+    :return: The state class.
+    """
+    cls._command_registry[command_class.name()] = command_class
+    return command_class
+
+
+
+ +
+ +
+ + +

+ register_command(command_name, command) + +

+ + +
+ +

Add to the command registry.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_name + (str) + – +
    +

    The command name.

    +
    +
  • +
  • + command + (CommandBasic) + – +
    +

    The command.

    +
    +
  • +
+
+
+ Source code in automator/basic.py +
24
+25
+26
+27
+28
+29
+30
+31
def register_command(self, command_name: str, command: CommandBasic) -> None:
+    """
+    Add to the command registry.
+    :param command_name: The command name.
+    :param command: The command.
+    """
+
+    self.command_registry[command_name] = command
+
+
+
+ +
+ +
+ + +

+ self_command_mapping() + +

+ + +
+ +

Get the command-receiver mapping.

+ +
+ Source code in automator/basic.py +
40
+41
+42
+43
+44
def self_command_mapping(self) -> Dict[str, CommandBasic]:
+    """
+    Get the command-receiver mapping.
+    """
+    return {command_name: self for command_name in self.supported_command_names}
+
+
+
+ +
+ + + +
+ +
+ +


+

Command

+

The Command is a specific action that the Receiver can perform on the application. It encapsulates the function and parameters required to execute the action. The Command class is a base class for all commands in the Automator application.

+

You can find the reference for a basic Command class below:

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

The abstract command interface.

+ +

Initialize the command.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + receiver + (ReceiverBasic) + – +
    +

    The receiver of the command.

    +
    +
  • +
+
+ + + + + +
+ Source code in automator/basic.py +
67
+68
+69
+70
+71
+72
+73
def __init__(self, receiver: ReceiverBasic, params: Dict = None) -> None:
+    """
+    Initialize the command.
+    :param receiver: The receiver of the command.
+    """
+    self.receiver = receiver
+    self.params = params if params is not None else {}
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ execute() + + + abstractmethod + + +

+ + +
+ +

Execute the command.

+ +
+ Source code in automator/basic.py +
75
+76
+77
+78
+79
+80
@abstractmethod
+def execute(self):
+    """
+    Execute the command.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ redo() + +

+ + +
+ +

Redo the command.

+ +
+ Source code in automator/basic.py +
88
+89
+90
+91
+92
def redo(self):
+    """
+    Redo the command.
+    """
+    self.execute()
+
+
+
+ +
+ +
+ + +

+ undo() + +

+ + +
+ +

Undo the command.

+ +
+ Source code in automator/basic.py +
82
+83
+84
+85
+86
def undo(self):
+    """
+    Undo the command.
+    """
+    pass
+
+
+
+ +
+ + + +
+ +
+ +


+
+

Note

+

Each command must register with a specific Receiver to be executed using the register_command decorator. For example: + @ReceiverExample.register + class CommandExample(CommandBasic): + ...

+
+

Invoker (AppPuppeteer)

+

The AppPuppeteer plays the role of the invoker in the Automator application. It triggers the commands to be executed by the receivers. The AppPuppeteer equips the AppAgent with the capability to interact with the application's UI controls. It provides functionalities to translate action strings into specific actions and execute them. All available actions are registered in the Puppeteer with the ReceiverManager class.

+

You can find the implementation of the AppPuppeteer class in the ufo/automator/puppeteer.py file, and its reference is shown below.

+ + +
+ + + + +
+ + +

The class for the app puppeteer to automate the app in the Windows environment.

+ +

Initialize the app puppeteer.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + process_name + (str) + – +
    +

    The process name of the app.

    +
    +
  • +
  • + app_root_name + (str) + – +
    +

    The app root name, e.g., WINWORD.EXE.

    +
    +
  • +
+
+ + + + + +
+ Source code in automator/puppeteer.py +
22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
def __init__(self, process_name: str, app_root_name: str) -> None:
+    """
+    Initialize the app puppeteer.
+    :param process_name: The process name of the app.
+    :param app_root_name: The app root name, e.g., WINWORD.EXE.
+    """
+
+    self._process_name = process_name
+    self._app_root_name = app_root_name
+    self.command_queue: Deque[CommandBasic] = deque()
+    self.receiver_manager = ReceiverManager()
+
+
+ + + +
+ + + + + + + +
+ + + +

+ full_path: str + + + property + + +

+ + +
+ +

Get the full path of the process. Only works for COM receiver.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The full path of the process.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ add_command(command_name, params, *args, **kwargs) + +

+ + +
+ +

Add the command to the command queue.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_name + (str) + – +
    +

    The command name.

    +
    +
  • +
  • + params + (Dict[str, Any]) + – +
    +

    The arguments.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
def add_command(
+    self, command_name: str, params: Dict[str, Any], *args, **kwargs
+) -> None:
+    """
+    Add the command to the command queue.
+    :param command_name: The command name.
+    :param params: The arguments.
+    """
+    command = self.create_command(command_name, params, *args, **kwargs)
+    self.command_queue.append(command)
+
+
+
+ +
+ +
+ + +

+ close() + +

+ + +
+ +

Close the app. Only works for COM receiver.

+ +
+ Source code in automator/puppeteer.py +
145
+146
+147
+148
+149
+150
+151
def close(self) -> None:
+    """
+    Close the app. Only works for COM receiver.
+    """
+    com_receiver = self.receiver_manager.com_receiver
+    if com_receiver is not None:
+        com_receiver.close()
+
+
+
+ +
+ +
+ + +

+ create_command(command_name, params, *args, **kwargs) + +

+ + +
+ +

Create the command.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_name + (str) + – +
    +

    The command name.

    +
    +
  • +
  • + params + (Dict[str, Any]) + – +
    +

    The arguments for the command.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
def create_command(
+    self, command_name: str, params: Dict[str, Any], *args, **kwargs
+) -> Optional[CommandBasic]:
+    """
+    Create the command.
+    :param command_name: The command name.
+    :param params: The arguments for the command.
+    """
+    receiver = self.receiver_manager.get_receiver_from_command_name(command_name)
+    command = receiver.command_registry.get(command_name.lower(), None)
+
+    if receiver is None:
+        raise ValueError(f"Receiver for command {command_name} is not found.")
+
+    if command is None:
+        raise ValueError(f"Command {command_name} is not supported.")
+
+    return command(receiver, params, *args, **kwargs)
+
+
+
+ +
+ +
+ + +

+ execute_all_commands() + +

+ + +
+ +

Execute all the commands in the command queue.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[Any] + – +
    +

    The execution results.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+92
def execute_all_commands(self) -> List[Any]:
+    """
+    Execute all the commands in the command queue.
+    :return: The execution results.
+    """
+    results = []
+    while self.command_queue:
+        command = self.command_queue.popleft()
+        results.append(command.execute())
+
+    return results
+
+
+
+ +
+ +
+ + +

+ execute_command(command_name, params, *args, **kwargs) + +

+ + +
+ +

Execute the command.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_name + (str) + – +
    +

    The command name.

    +
    +
  • +
  • + params + (Dict[str, Any]) + – +
    +

    The arguments.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The execution result.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
def execute_command(
+    self, command_name: str, params: Dict[str, Any], *args, **kwargs
+) -> str:
+    """
+    Execute the command.
+    :param command_name: The command name.
+    :param params: The arguments.
+    :return: The execution result.
+    """
+
+    command = self.create_command(command_name, params, *args, **kwargs)
+
+    return command.execute()
+
+
+
+ +
+ +
+ + +

+ get_command_queue_length() + +

+ + +
+ +

Get the length of the command queue.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The length of the command queue.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
105
+106
+107
+108
+109
+110
def get_command_queue_length(self) -> int:
+    """
+    Get the length of the command queue.
+    :return: The length of the command queue.
+    """
+    return len(self.command_queue)
+
+
+
+ +
+ +
+ + +

+ get_command_string(command_name, params) + + + staticmethod + + +

+ + +
+ +

Generate a function call string.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_name + (str) + – +
    +

    The function name.

    +
    +
  • +
  • + params + (Dict[str, str]) + – +
    +

    The arguments as a dictionary.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The function call string.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
@staticmethod
+def get_command_string(command_name: str, params: Dict[str, str]) -> str:
+    """
+    Generate a function call string.
+    :param command_name: The function name.
+    :param params: The arguments as a dictionary.
+    :return: The function call string.
+    """
+    # Format the arguments
+    args_str = ", ".join(f"{k}={v!r}" for k, v in params.items())
+
+    # Return the function call string
+    return f"{command_name}({args_str})"
+
+
+
+ +
+ +
+ + +

+ get_command_types(command_name) + +

+ + +
+ +

Get the command types.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_name + (str) + – +
    +

    The command name.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The command types.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
def get_command_types(self, command_name: str) -> str:
+    """
+    Get the command types.
+    :param command_name: The command name.
+    :return: The command types.
+    """
+
+    try:
+        receiver = self.receiver_manager.get_receiver_from_command_name(
+            command_name
+        )
+        return receiver.type_name
+    except:
+        return ""
+
+
+
+ +
+ +
+ + +

+ save() + +

+ + +
+ +

Save the current state of the app. Only works for COM receiver.

+ +
+ Source code in automator/puppeteer.py +
124
+125
+126
+127
+128
+129
+130
def save(self) -> None:
+    """
+    Save the current state of the app. Only works for COM receiver.
+    """
+    com_receiver = self.receiver_manager.com_receiver
+    if com_receiver is not None:
+        com_receiver.save()
+
+
+
+ +
+ +
+ + +

+ save_to_xml(file_path) + +

+ + +
+ +

Save the current state of the app to XML. Only works for COM receiver.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + file_path + (str) + – +
    +

    The file path to save the XML.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
def save_to_xml(self, file_path: str) -> None:
+    """
+    Save the current state of the app to XML. Only works for COM receiver.
+    :param file_path: The file path to save the XML.
+    """
+    com_receiver = self.receiver_manager.com_receiver
+    dir_path = os.path.dirname(file_path)
+    if not os.path.exists(dir_path):
+        os.makedirs(dir_path)
+
+    if com_receiver is not None:
+        com_receiver.save_to_xml(file_path)
+
+
+
+ +
+ + + +
+ +
+ +


+

Receiver Manager

+

The ReceiverManager manages all the receivers and commands in the Automator application. It provides functionalities to register and retrieve receivers and commands. It is a complementary component to the AppPuppeteer.

+ + +
+ + + + +
+ + +

The class for the receiver manager.

+ +

Initialize the receiver manager.

+ + + + + + +
+ Source code in automator/puppeteer.py +
175
+176
+177
+178
+179
+180
+181
+182
+183
def __init__(self):
+    """
+    Initialize the receiver manager.
+    """
+
+    self.receiver_registry = {}
+    self.ui_control_receiver: Optional[ControlReceiver] = None
+
+    self._receiver_list: List[ReceiverBasic] = []
+
+
+ + + +
+ + + + + + + +
+ + + +

+ com_receiver: WinCOMReceiverBasic + + + property + + +

+ + +
+ +

Get the COM receiver.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + WinCOMReceiverBasic + – +
    +

    The COM receiver.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ receiver_factory_registry: Dict[str, Dict[str, Union[str, ReceiverFactory]]] + + + property + + +

+ + +
+ +

Get the receiver factory registry.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, Dict[str, Union[str, ReceiverFactory]]] + – +
    +

    The receiver factory registry.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ receiver_list: List[ReceiverBasic] + + + property + + +

+ + +
+ +

Get the receiver list.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[ReceiverBasic] + – +
    +

    The receiver list.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ create_api_receiver(app_root_name, process_name) + +

+ + +
+ +

Get the API receiver.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + app_root_name + (str) + – +
    +

    The app root name.

    +
    +
  • +
  • + process_name + (str) + – +
    +

    The process name.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+222
+223
+224
def create_api_receiver(self, app_root_name: str, process_name: str) -> None:
+    """
+    Get the API receiver.
+    :param app_root_name: The app root name.
+    :param process_name: The process name.
+    """
+    for receiver_factory_dict in self.receiver_factory_registry.values():
+
+        # Check if the receiver is API
+        if receiver_factory_dict.get("is_api"):
+            receiver = receiver_factory_dict.get("factory").create_receiver(
+                app_root_name, process_name
+            )
+            if receiver is not None:
+                self.receiver_list.append(receiver)
+
+    self._update_receiver_registry()
+
+
+
+ +
+ +
+ + +

+ create_ui_control_receiver(control, application) + +

+ + +
+ +

Build the UI controller.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control + (UIAWrapper) + – +
    +

    The control element.

    +
    +
  • +
  • + application + (UIAWrapper) + – +
    +

    The application window.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + ControlReceiver + – +
    +

    The UI controller receiver.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
+204
+205
+206
def create_ui_control_receiver(
+    self, control: UIAWrapper, application: UIAWrapper
+) -> "ControlReceiver":
+    """
+    Build the UI controller.
+    :param control: The control element.
+    :param application: The application window.
+    :return: The UI controller receiver.
+    """
+
+    # control can be None
+    if not application:
+        return None
+
+    factory: ReceiverFactory = self.receiver_factory_registry.get("UIControl").get(
+        "factory"
+    )
+    self.ui_control_receiver = factory.create_receiver(control, application)
+    self.receiver_list.append(self.ui_control_receiver)
+    self._update_receiver_registry()
+
+    return self.ui_control_receiver
+
+
+
+ +
+ +
+ + +

+ get_receiver_from_command_name(command_name) + +

+ + +
+ +

Get the receiver from the command name.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + command_name + (str) + – +
    +

    The command name.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + ReceiverBasic + – +
    +

    The mapped receiver.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
235
+236
+237
+238
+239
+240
+241
+242
+243
+244
def get_receiver_from_command_name(self, command_name: str) -> ReceiverBasic:
+    """
+    Get the receiver from the command name.
+    :param command_name: The command name.
+    :return: The mapped receiver.
+    """
+    receiver = self.receiver_registry.get(command_name, None)
+    if receiver is None:
+        raise ValueError(f"Receiver for command {command_name} is not found.")
+    return receiver
+
+
+
+ +
+ +
+ + +

+ register(receiver_factory_class) + + + classmethod + + +

+ + +
+ +

Decorator to register the receiver factory class to the receiver manager.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + receiver_factory_class + (Type[ReceiverFactory]) + – +
    +

    The receiver factory class to be registered.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + ReceiverFactory + – +
    +

    The receiver factory class instance.

    +
    +
  • +
+
+
+ Source code in automator/puppeteer.py +
276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
@classmethod
+def register(cls, receiver_factory_class: Type[ReceiverFactory]) -> ReceiverFactory:
+    """
+    Decorator to register the receiver factory class to the receiver manager.
+    :param receiver_factory_class: The receiver factory class to be registered.
+    :return: The receiver factory class instance.
+    """
+
+    cls._receiver_factory_registry[receiver_factory_class.name()] = {
+        "factory": receiver_factory_class(),
+        "is_api": receiver_factory_class.is_api(),
+    }
+
+    return receiver_factory_class()
+
+
+
+ +
+ + + +
+ +
+ +


+

For further details, refer to the specific documentation for each component and class in the Automator module.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/automator/ui_automator/index.html b/automator/ui_automator/index.html new file mode 100644 index 00000000..2e8e683d --- /dev/null +++ b/automator/ui_automator/index.html @@ -0,0 +1,1931 @@ + + + + + + + + UI Automator - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

UI Automator

+

The UI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus.

+

Configuration

+

There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml file. Below is the list of configurations related to the UI Automator:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
CONTROL_BACKENDThe backend for control action, currently supporting uia and win32.String"uia"
CONTROL_LISTThe list of widgets allowed to be selected.List["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"]
ANNOTATION_COLORSThe colors assigned to different control types for annotation.Dictionary{"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"}
API_PROMPTThe prompt for the UI automation API.String"ufo/prompts/share/base/api.yaml"
CLICK_APIThe API used for click action, can be click_input or click.String"click_input"
INPUT_TEXT_APIThe API used for input text action, can be type_keys or set_text.String"type_keys"
INPUT_TEXT_ENTERWhether to press enter after typing the text.BooleanFalse
+

Receiver

+

The receiver of the UI Automator is the ControlReceiver class defined in the ufo/automator/ui_control/controller/control_receiver module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver class:

+ + +
+ + + + +
+

+ Bases: ReceiverBasic

+ + +

The control receiver class.

+ +

Initialize the control receiver.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + control + (Optional[UIAWrapper]) + – +
    +

    The control element.

    +
    +
  • +
  • + application + (Optional[UIAWrapper]) + – +
    +

    The application element.

    +
    +
  • +
+
+ + + + + +
+ Source code in automator/ui_control/controller.py +
33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
def __init__(
+    self, control: Optional[UIAWrapper], application: Optional[UIAWrapper]
+) -> None:
+    """
+    Initialize the control receiver.
+    :param control: The control element.
+    :param application: The application element.
+    """
+
+    self.control = control
+    self.application = application
+
+    if control:
+        self.control.set_focus()
+        self.wait_enabled()
+    elif application:
+        self.application.set_focus()
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ annotation(params, annotation_dict) + +

+ + +
+ +

Take a screenshot of the current application window and annotate the control item on the screenshot.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, str]) + – +
    +

    The arguments of the annotation method.

    +
    +
  • +
  • + annotation_dict + (Dict[str, UIAWrapper]) + – +
    +

    The dictionary of the control labels.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
def annotation(
+    self, params: Dict[str, str], annotation_dict: Dict[str, UIAWrapper]
+) -> List[str]:
+    """
+    Take a screenshot of the current application window and annotate the control item on the screenshot.
+    :param params: The arguments of the annotation method.
+    :param annotation_dict: The dictionary of the control labels.
+    """
+    selected_controls_labels = params.get("control_labels", [])
+
+    control_reannotate = [
+        annotation_dict[str(label)] for label in selected_controls_labels
+    ]
+
+    return control_reannotate
+
+
+
+ +
+ +
+ + +

+ atomic_execution(method_name, params) + +

+ + +
+ +

Atomic execution of the action on the control elements.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + method_name + (str) + – +
    +

    The name of the method to execute.

    +
    +
  • +
  • + params + (Dict[str, Any]) + – +
    +

    The arguments of the method.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result of the action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
def atomic_execution(self, method_name: str, params: Dict[str, Any]) -> str:
+    """
+    Atomic execution of the action on the control elements.
+    :param method_name: The name of the method to execute.
+    :param params: The arguments of the method.
+    :return: The result of the action.
+    """
+
+    import traceback
+
+    try:
+        method = getattr(self.control, method_name)
+        result = method(**params)
+    except AttributeError:
+        message = f"{self.control} doesn't have a method named {method_name}"
+        print_with_color(f"Warning: {message}", "yellow")
+        result = message
+    except Exception as e:
+        full_traceback = traceback.format_exc()
+        message = f"An error occurred: {full_traceback}"
+        print_with_color(f"Warning: {message}", "yellow")
+        result = message
+    return result
+
+
+
+ +
+ +
+ + +

+ click_input(params) + +

+ + +
+ +

Click the control element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, Union[str, bool]]) + – +
    +

    The arguments of the click method.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result of the click action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
def click_input(self, params: Dict[str, Union[str, bool]]) -> str:
+    """
+    Click the control element.
+    :param params: The arguments of the click method.
+    :return: The result of the click action.
+    """
+
+    api_name = configs.get("CLICK_API", "click_input")
+
+    if api_name == "click":
+        return self.atomic_execution("click", params)
+    else:
+        return self.atomic_execution("click_input", params)
+
+
+
+ +
+ +
+ + +

+ click_on_coordinates(params) + +

+ + +
+ +

Click on the coordinates of the control element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, str]) + – +
    +

    The arguments of the click on coordinates method.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result of the click on coordinates action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
def click_on_coordinates(self, params: Dict[str, str]) -> str:
+    """
+    Click on the coordinates of the control element.
+    :param params: The arguments of the click on coordinates method.
+    :return: The result of the click on coordinates action.
+    """
+
+    # Get the relative coordinates fraction of the application window.
+    x = float(params.get("x", 0))
+    y = float(params.get("y", 0))
+
+    button = params.get("button", "left")
+    double = params.get("double", False)
+
+    # Get the absolute coordinates of the application window.
+    tranformed_x, tranformed_y = self.transform_point(x, y)
+
+    self.application.set_focus()
+
+    pyautogui.click(
+        tranformed_x, tranformed_y, button=button, clicks=2 if double else 1
+    )
+
+    return ""
+
+
+
+ +
+ +
+ + +

+ drag_on_coordinates(params) + +

+ + +
+ +

Drag on the coordinates of the control element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, str]) + – +
    +

    The arguments of the drag on coordinates method.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result of the drag on coordinates action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
def drag_on_coordinates(self, params: Dict[str, str]) -> str:
+    """
+    Drag on the coordinates of the control element.
+    :param params: The arguments of the drag on coordinates method.
+    :return: The result of the drag on coordinates action.
+    """
+
+    start = self.transform_point(
+        float(params.get("start_x", 0)), float(params.get("start_y", 0))
+    )
+    end = self.transform_point(
+        float(params.get("end_x", 0)), float(params.get("end_y", 0))
+    )
+
+    button = params.get("button", "left")
+
+    self.application.set_focus()
+
+    pyautogui.moveTo(start[0], start[1])
+    pyautogui.dragTo(end[0], end[1], button=button)
+
+    return ""
+
+
+
+ +
+ +
+ + +

+ keyboard_input(params) + +

+ + +
+ +

Keyboard input on the control element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, str]) + – +
    +

    The arguments of the keyboard input method.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result of the keyboard input action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
201
+202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
def keyboard_input(self, params: Dict[str, str]) -> str:
+    """
+    Keyboard input on the control element.
+    :param params: The arguments of the keyboard input method.
+    :return: The result of the keyboard input action.
+    """
+
+    control_focus = params.get("control_focus", True)
+    keys = params.get("keys", "")
+
+    if control_focus:
+        self.atomic_execution("type_keys", {"keys": keys})
+    else:
+        pyautogui.typewrite(keys)
+    return keys
+
+
+
+ +
+ +
+ + +

+ no_action() + +

+ + +
+ +

No action on the control element.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The result of the no action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
232
+233
+234
+235
+236
+237
+238
def no_action(self):
+    """
+    No action on the control element.
+    :return: The result of the no action.
+    """
+
+    return ""
+
+
+
+ +
+ +
+ + +

+ set_edit_text(params) + +

+ + +
+ +

Set the edit text of the control element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, str]) + – +
    +

    The arguments of the set edit text method.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result of the set edit text action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
def set_edit_text(self, params: Dict[str, str]) -> str:
+    """
+    Set the edit text of the control element.
+    :param params: The arguments of the set edit text method.
+    :return: The result of the set edit text action.
+    """
+
+    text = params.get("text", "")
+    inter_key_pause = configs.get("INPUT_TEXT_INTER_KEY_PAUSE", 0.1)
+
+    if configs["INPUT_TEXT_API"] == "set_text":
+        method_name = "set_edit_text"
+        args = {"text": text}
+    else:
+        method_name = "type_keys"
+
+        # Transform the text according to the tags.
+        text = TextTransformer.transform_text(text, "all")
+
+        args = {"keys": text, "pause": inter_key_pause, "with_spaces": True}
+    try:
+        result = self.atomic_execution(method_name, args)
+        if (
+            method_name == "set_text"
+            and args["text"] not in self.control.window_text()
+        ):
+            raise Exception(f"Failed to use set_text: {args['text']}")
+        if configs["INPUT_TEXT_ENTER"] and method_name in ["type_keys", "set_text"]:
+
+            self.atomic_execution("type_keys", params={"keys": "{ENTER}"})
+        return result
+    except Exception as e:
+        if method_name == "set_text":
+            print_with_color(
+                f"{self.control} doesn't have a method named {method_name}, trying default input method",
+                "yellow",
+            )
+            method_name = "type_keys"
+            clear_text_keys = "^a{BACKSPACE}"
+            text_to_type = args["text"]
+            keys_to_send = clear_text_keys + text_to_type
+            method_name = "type_keys"
+            args = {
+                "keys": keys_to_send,
+                "pause": inter_key_pause,
+                "with_spaces": True,
+            }
+            return self.atomic_execution(method_name, args)
+        else:
+            return f"An error occurred: {e}"
+
+
+
+ +
+ +
+ + +

+ summary(params) + +

+ + +
+ +

Visual summary of the control element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, str]) + – +
    +

    The arguments of the visual summary method. should contain a key "text" with the text summary.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result of the visual summary action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
141
+142
+143
+144
+145
+146
+147
+148
def summary(self, params: Dict[str, str]) -> str:
+    """
+    Visual summary of the control element.
+    :param params: The arguments of the visual summary method. should contain a key "text" with the text summary.
+    :return: The result of the visual summary action.
+    """
+
+    return params.get("text")
+
+
+
+ +
+ +
+ + +

+ texts() + +

+ + +
+ +

Get the text of the control element.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The text of the control element.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
217
+218
+219
+220
+221
+222
def texts(self) -> str:
+    """
+    Get the text of the control element.
+    :return: The text of the control element.
+    """
+    return self.control.texts()
+
+
+
+ +
+ +
+ + +

+ transform_point(fraction_x, fraction_y) + +

+ + +
+ +

Transform the relative coordinates to the absolute coordinates.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + fraction_x + (float) + – +
    +

    The relative x coordinate.

    +
    +
  • +
  • + fraction_y + (float) + – +
    +

    The relative y coordinate.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Tuple[int, int] + – +
    +

    The absolute coordinates.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
282
+283
+284
+285
+286
+287
+288
+289
+290
+291
+292
+293
+294
+295
+296
+297
+298
def transform_point(self, fraction_x: float, fraction_y: float) -> Tuple[int, int]:
+    """
+    Transform the relative coordinates to the absolute coordinates.
+    :param fraction_x: The relative x coordinate.
+    :param fraction_y: The relative y coordinate.
+    :return: The absolute coordinates.
+    """
+    application_rect: RECT = self.application.rectangle()
+    application_x = application_rect.left
+    application_y = application_rect.top
+    application_width = application_rect.width()
+    application_height = application_rect.height()
+
+    x = application_x + int(application_width * fraction_x)
+    y = application_y + int(application_height * fraction_y)
+
+    return x, y
+
+
+
+ +
+ +
+ + +

+ wait_enabled(timeout=10, retry_interval=0.5) + +

+ + +
+ +

Wait until the control is enabled.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + timeout + (int, default: + 10 +) + – +
    +

    The timeout to wait.

    +
    +
  • +
  • + retry_interval + (int, default: + 0.5 +) + – +
    +

    The retry interval to wait.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
def wait_enabled(self, timeout: int = 10, retry_interval: int = 0.5) -> None:
+    """
+    Wait until the control is enabled.
+    :param timeout: The timeout to wait.
+    :param retry_interval: The retry interval to wait.
+    """
+    while not self.control.is_enabled():
+        time.sleep(retry_interval)
+        timeout -= retry_interval
+        if timeout <= 0:
+            warnings.warn(f"Timeout: {self.control} is not enabled.")
+            break
+
+
+
+ +
+ +
+ + +

+ wait_visible(timeout=10, retry_interval=0.5) + +

+ + +
+ +

Wait until the window is enabled.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + timeout + (int, default: + 10 +) + – +
    +

    The timeout to wait.

    +
    +
  • +
  • + retry_interval + (int, default: + 0.5 +) + – +
    +

    The retry interval to wait.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
def wait_visible(self, timeout: int = 10, retry_interval: int = 0.5) -> None:
+    """
+    Wait until the window is enabled.
+    :param timeout: The timeout to wait.
+    :param retry_interval: The retry interval to wait.
+    """
+    while not self.control.is_visible():
+        time.sleep(retry_interval)
+        timeout -= retry_interval
+        if timeout <= 0:
+            warnings.warn(f"Timeout: {self.control} is not visible.")
+            break
+
+
+
+ +
+ +
+ + +

+ wheel_mouse_input(params) + +

+ + +
+ +

Wheel mouse input on the control element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + params + (Dict[str, str]) + – +
    +

    The arguments of the wheel mouse input method.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The result of the wheel mouse input action.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/controller.py +
224
+225
+226
+227
+228
+229
+230
def wheel_mouse_input(self, params: Dict[str, str]):
+    """
+    Wheel mouse input on the control element.
+    :param params: The arguments of the wheel mouse input method.
+    :return: The result of the wheel mouse input action.
+    """
+    return self.atomic_execution("wheel_mouse_input", params)
+
+
+
+ +
+ + + +
+ +
+ +


+

Command

+

The command of the UI Automator is the ControlCommand class defined in the ufo/automator/ui_control/controller/ControlCommand module. It encapsulates the function and parameters required to execute the action. The ControlCommand class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand class that inherits from the ControlCommand class:

+
@ControlReceiver.register
+class ClickInputCommand(ControlCommand):
+    """
+    The click input command class.
+    """
+
+    def execute(self) -> str:
+        """
+        Execute the click input command.
+        :return: The result of the click input command.
+        """
+        return self.receiver.click_input(self.params)
+
+    @classmethod
+    def name(cls) -> str:
+        """
+        Get the name of the atomic command.
+        :return: The name of the atomic command.
+        """
+        return "click_input"
+
+
+

Note

+

The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command.

+
+
+

Note

+

Each command must register with a specific ControlReceiver to be executed using the @ControlReceiver.register decorator.

+
+

Below is the list of available commands in the UI Automator that are currently supported by UFO:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Command NameFunction NameDescription
ClickInputCommandclick_inputClick the control item with the mouse.
ClickOnCoordinatesCommandclick_on_coordinatesClick on the specific fractional coordinates of the application window.
DragOnCoordinatesCommanddrag_on_coordinatesDrag the mouse on the specific fractional coordinates of the application window.
SetEditTextCommandset_edit_textAdd new text to the control item.
GetTextsCommandtextsGet the text of the control item.
WheelMouseInputCommandwheel_mouse_inputScroll the control item.
KeyboardInputCommandkeyboard_inputSimulate the keyboard input.
+
+

Tip

+

Please refer to the ufo/prompts/share/base/api.yaml file for the detailed API documentation of the UI Automator.

+
+
+

Tip

+

You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand module.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/automator/web_automator/index.html b/automator/web_automator/index.html new file mode 100644 index 00000000..00d28aee --- /dev/null +++ b/automator/web_automator/index.html @@ -0,0 +1,585 @@ + + + + + + + + Web Automator - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Web Automator

+

We also support the use of the Web Automator to get the content of a web page. The Web Automator is implemented in ufo/autoamtor/app_apis/web module.

+

Configuration

+

There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator:

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
USE_APISWhether to allow the use of application APIs.BooleanTrue
APP_API_PROMPT_ADDRESSThe prompt address for the application API.Dict{"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"}
+
+

Note

+

Only msedge.exe and chrome.exe are currently supported by the Web Automator.

+
+

Receiver

+

The Web Automator receiver is the WebReceiver class defined in the ufo/automator/app_apis/web/webclient.py module:

+ + +
+ + + + +
+

+ Bases: ReceiverBasic

+ + +

The base class for Web COM client using crawl4ai.

+ +

Initialize the Web COM client.

+ + + + + + +
+ Source code in automator/app_apis/web/webclient.py +
21
+22
+23
+24
+25
+26
+27
def __init__(self) -> None:
+    """
+    Initialize the Web COM client.
+    """
+    self._headers = {
+        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
+    }
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ web_crawler(url, ignore_link) + +

+ + +
+ +

Run the crawler with various options.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + url + (str) + – +
    +

    The URL of the webpage.

    +
    +
  • +
  • + ignore_link + (bool) + – +
    +

    Whether to ignore the links.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The result markdown content.

    +
    +
  • +
+
+
+ Source code in automator/app_apis/web/webclient.py +
29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
def web_crawler(self, url: str, ignore_link: bool) -> str:
+    """
+    Run the crawler with various options.
+    :param url: The URL of the webpage.
+    :param ignore_link: Whether to ignore the links.
+    :return: The result markdown content.
+    """
+
+    try:
+        # Get the HTML content of the webpage
+        response = requests.get(url, headers=self._headers)
+        response.raise_for_status()
+
+        html_content = response.text
+
+        # Convert the HTML content to markdown
+        h = html2text.HTML2Text()
+        h.ignore_links = ignore_link
+        markdown_content = h.handle(html_content)
+
+        return markdown_content
+
+    except requests.RequestException as e:
+        print(f"Error fetching the URL: {e}")
+
+        return f"Error fetching the URL: {e}"
+
+
+
+ +
+ + + +
+ +
+ +


+

Command

+

We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator.

+
@WebReceiver.register
+class WebCrawlerCommand(WebCommand):
+    """
+    The command to run the crawler with various options.
+    """
+
+    def execute(self):
+        """
+        Execute the command to run the crawler.
+        :return: The result content.
+        """
+        return self.receiver.web_crawler(
+            url=self.params.get("url"),
+            ignore_link=self.params.get("ignore_link", False),
+        )
+
+    @classmethod
+    def name(cls) -> str:
+        """
+        The name of the command.
+        """
+        return "web_crawler"
+
+

Below is the list of available commands in the Web Automator that are currently supported by UFO:

+ + + + + + + + + + + + + + + +
Command NameFunction NameDescription
WebCrawlerCommandweb_crawlerGet the content of a web page into a markdown format.
+
+

Tip

+

Please refer to the ufo/prompts/apps/web/api.yaml file for the prompt details for the WebCrawlerCommand command.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/automator/wincom_automator/index.html b/automator/wincom_automator/index.html new file mode 100644 index 00000000..be79778d --- /dev/null +++ b/automator/wincom_automator/index.html @@ -0,0 +1,1104 @@ + + + + + + + + API Automator - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

API Automator

+

UFO currently support the use of Win32 API API automator to interact with the application's native API. We implement them in python using the pywin32 library. The API automator now supports Word and Excel applications, and we are working on extending the support to other applications.

+

Configuration

+

There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator:

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
USE_APISWhether to allow the use of application APIs.BooleanTrue
APP_API_PROMPT_ADDRESSThe prompt address for the application API.Dict{"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"}
+
+

Note

+

Only WINWORD.EXE and EXCEL.EXE are currently supported by the API Automator.

+
+

Receiver

+

The base class for the receiver of the API Automator is the WinCOMReceiverBasic class defined in the ufo/automator/app_apis/basic module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the WinCOMReceiverBasic class:

+ + +
+ + + + +
+

+ Bases: ReceiverBasic

+ + +

The base class for Windows COM client.

+ +

Initialize the Windows COM client.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + app_root_name + (str) + – +
    +

    The app root name.

    +
    +
  • +
  • + process_name + (str) + – +
    +

    The process name.

    +
    +
  • +
  • + clsid + (str) + – +
    +

    The CLSID of the COM object.

    +
    +
  • +
+
+ + + + + +
+ Source code in automator/app_apis/basic.py +
20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None:
+    """
+    Initialize the Windows COM client.
+    :param app_root_name: The app root name.
+    :param process_name: The process name.
+    :param clsid: The CLSID of the COM object.
+    """
+
+    self.app_root_name = app_root_name
+    self.process_name = process_name
+
+    self.clsid = clsid
+
+    self.client = win32com.client.Dispatch(self.clsid)
+    self.com_object = self.get_object_from_process_name()
+
+
+ + + +
+ + + + + + + +
+ + + +

+ full_path: str + + + property + + +

+ + +
+ +

Get the full path of the process.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The full path of the process.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ app_match(object_name_list) + +

+ + +
+ +

Check if the process name matches the app root.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + object_name_list + (List[str]) + – +
    +

    The list of object name.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The matched object name.

    +
    +
  • +
+
+
+ Source code in automator/app_apis/basic.py +
57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
def app_match(self, object_name_list: List[str]) -> str:
+    """
+    Check if the process name matches the app root.
+    :param object_name_list: The list of object name.
+    :return: The matched object name.
+    """
+
+    suffix = self.get_suffix_mapping()
+
+    if self.process_name.endswith(suffix):
+        clean_process_name = self.process_name[: -len(suffix)]
+    else:
+        clean_process_name = self.process_name
+
+    if not object_name_list:
+        return ""
+
+    return max(
+        object_name_list,
+        key=lambda x: self.longest_common_substring_length(clean_process_name, x),
+    )
+
+
+
+ +
+ +
+ + +

+ close() + +

+ + +
+ +

Close the app.

+ +
+ Source code in automator/app_apis/basic.py +
110
+111
+112
+113
+114
+115
+116
+117
def close(self) -> None:
+    """
+    Close the app.
+    """
+    try:
+        self.com_object.Close()
+    except:
+        pass
+
+
+
+ +
+ +
+ + +

+ get_object_from_process_name() + + + abstractmethod + + +

+ + +
+ +

Get the object from the process name.

+ +
+ Source code in automator/app_apis/basic.py +
36
+37
+38
+39
+40
+41
@abstractmethod
+def get_object_from_process_name(self) -> win32com.client.CDispatch:
+    """
+    Get the object from the process name.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ get_suffix_mapping() + +

+ + +
+ +

Get the suffix mapping.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, str] + – +
    +

    The suffix mapping.

    +
    +
  • +
+
+
+ Source code in automator/app_apis/basic.py +
43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
def get_suffix_mapping(self) -> Dict[str, str]:
+    """
+    Get the suffix mapping.
+    :return: The suffix mapping.
+    """
+    suffix_mapping = {
+        "WINWORD.EXE": "docx",
+        "EXCEL.EXE": "xlsx",
+        "POWERPNT.EXE": "pptx",
+        "olk.exe": "msg",
+    }
+
+    return suffix_mapping.get(self.app_root_name, None)
+
+
+
+ +
+ +
+ + +

+ longest_common_substring_length(str1, str2) + + + staticmethod + + +

+ + +
+ +

Get the longest common substring of two strings.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + str1 + (str) + – +
    +

    The first string.

    +
    +
  • +
  • + str2 + (str) + – +
    +

    The second string.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + int + – +
    +

    The length of the longest common substring.

    +
    +
  • +
+
+
+ Source code in automator/app_apis/basic.py +
127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
@staticmethod
+def longest_common_substring_length(str1: str, str2: str) -> int:
+    """
+    Get the longest common substring of two strings.
+    :param str1: The first string.
+    :param str2: The second string.
+    :return: The length of the longest common substring.
+    """
+
+    m = len(str1)
+    n = len(str2)
+
+    dp = [[0] * (n + 1) for _ in range(m + 1)]
+
+    max_length = 0
+
+    for i in range(1, m + 1):
+        for j in range(1, n + 1):
+            if str1[i - 1] == str2[j - 1]:
+                dp[i][j] = dp[i - 1][j - 1] + 1
+                if dp[i][j] > max_length:
+                    max_length = dp[i][j]
+            else:
+                dp[i][j] = 0
+
+    return max_length
+
+
+
+ +
+ +
+ + +

+ save() + +

+ + +
+ +

Save the current state of the app.

+ +
+ Source code in automator/app_apis/basic.py +
91
+92
+93
+94
+95
+96
+97
+98
def save(self) -> None:
+    """
+    Save the current state of the app.
+    """
+    try:
+        self.com_object.Save()
+    except:
+        pass
+
+
+
+ +
+ +
+ + +

+ save_to_xml(file_path) + +

+ + +
+ +

Save the current state of the app to XML.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + file_path + (str) + – +
    +

    The file path to save the XML.

    +
    +
  • +
+
+
+ Source code in automator/app_apis/basic.py +
100
+101
+102
+103
+104
+105
+106
+107
+108
def save_to_xml(self, file_path: str) -> None:
+    """
+    Save the current state of the app to XML.
+    :param file_path: The file path to save the XML.
+    """
+    try:
+        self.com_object.SaveAs(file_path, self.xml_format_code)
+    except:
+        pass
+
+
+
+ +
+ + + +
+ +
+ +

The receiver of Word and Excel applications inherit from the WinCOMReceiverBasic class. The WordReceiver and ExcelReceiver classes are defined in the ufo/automator/app_apis/word and ufo/automator/app_apis/excel modules, respectively:

+

Command

+

The command of the API Automator for the Word and Excel applications in located in the client module in the ufo/automator/app_apis/{app_name} folder inheriting from the WinCOMCommand class. It encapsulates the function and parameters required to execute the action. Below is an example of a WordCommand class that inherits from the SelectTextCommand class:

+
@WordWinCOMReceiver.register
+class SelectTextCommand(WinCOMCommand):
+    """
+    The command to select text.
+    """
+
+    def execute(self):
+        """
+        Execute the command to select text.
+        :return: The selected text.
+        """
+        return self.receiver.select_text(self.params.get("text"))
+
+    @classmethod
+    def name(cls) -> str:
+        """
+        The name of the command.
+        """
+        return "select_text"
+
+
+

Note

+

The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command.

+
+
+

Note

+

Each command must register with a concrete WinCOMReceiver to be executed using the register decorator.

+
+

Below is the list of available commands in the API Automator that are currently supported by UFO:

+

Word API Commands

+ + + + + + + + + + + + + + + + + + + + + + + + + +
Command NameFunction NameDescription
InsertTableCommandinsert_tableInsert a table to a Word document.
SelectTextCommandselect_textSelect the text in a Word document.
SelectTableCommandselect_tableSelect a table in a Word document.
+

Excel API Commands

+ + + + + + + + + + + + + + + + + + + + + + + + + +
Command NameFunction NameDescription
GetSheetContentCommandget_sheet_contentGet the content of a sheet in the Excel app.
Table2MarkdownCommandtable2markdownConvert the table content in a sheet of the Excel app to markdown format.
InsertExcelTableCommandinsert_excel_tableInsert a table to the Excel sheet.
+
+

Tip

+

Please refer to the ufo/prompts/apps/{app_name}/api.yaml file for the prompt details for the commands.

+
+
+

Tip

+

You can customize the commands by adding new command classes to the ufo/automator/app_apis/{app_name}/ module.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/configurations/developer_configuration/index.html b/configurations/developer_configuration/index.html new file mode 100644 index 00000000..e2c90f68 --- /dev/null +++ b/configurations/developer_configuration/index.html @@ -0,0 +1,734 @@ + + + + + + + + Developer Configuration - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Developer Configuration

+

This section provides detailed information on how to configure the UFO agent for developers. The configuration file config_dev.yaml is located in the ufo/config directory and contains various settings and switches to customize the UFO agent for development purposes.

+

System Configuration

+

The following parameters are included in the system configuration of the UFO agent:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
CONTROL_BACKENDThe backend for control action, currently supporting uia and win32.String"uia"
MAX_STEPThe maximum step limit for completing the user request in a session.Integer100
SLEEP_TIMEThe sleep time in seconds between each step to wait for the window to be ready.Integer5
RECTANGLE_TIMEThe time in seconds for the rectangle display around the selected control.Integer1
SAFE_GUARDWhether to use the safe guard to ask for user confirmation before performing sensitive operations.BooleanTrue
CONTROL_LISTThe list of widgets allowed to be selected.List["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"]
HISTORY_KEYSThe keys of the step history added to the Blackboard for agent decision-making.List["Step", "Thought", "ControlText", "Subtask", "Action", "Comment", "Results", "UserConfirm"]
ANNOTATION_COLORSThe colors assigned to different control types for annotation.Dictionary{"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"}
PRINT_LOGWhether to print the log in the console.BooleanFalse
CONCAT_SCREENSHOTWhether to concatenate the screenshots into a single image for the LLM input.BooleanFalse
INCLUDE_LAST_SCREENSHOTWhether to include the screenshot from the last step in the observation.BooleanTrue
LOG_LEVELThe log level for the UFO agent.String"DEBUG"
REQUEST_TIMEOUTThe call timeout in seconds for the LLM model.Integer250
USE_APISWhether to allow the use of application APIs.BooleanTrue
LOG_XMLWhether to log the XML file at every step.BooleanFalse
SCREENSHOT_TO_MEMORYWhether to allow the screenshot to Blackboard for the agent's decision making.BooleanTrue
SAVE_UI_TREEWhether to save the UI tree in the log.BooleanFalse
+

Main Prompt Configuration

+

Main Prompt Templates

+

The main prompt templates include the prompts in the UFO agent for both system and user roles.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
HOSTAGENT_PROMPTThe main prompt template for the HostAgent.String"ufo/prompts/share/base/host_agent.yaml"
APPAGENT_PROMPTThe main prompt template for the AppAgent.String"ufo/prompts/share/base/app_agent.yaml"
FOLLOWERAGENT_PROMPTThe main prompt template for the FollowerAgent.String"ufo/prompts/share/base/app_agent.yaml"
EVALUATION_PROMPTThe prompt template for the evaluation.String"ufo/prompts/evaluation/evaluate.yaml"
+

Lite versions of the main prompt templates can be found in the ufo/prompts/share/lite directory to reduce the input size for specific token limits.

+

Example Prompt Templates

+

Example prompt templates are used for demonstration purposes in the UFO agent.

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
HOSTAGENT_EXAMPLE_PROMPTThe example prompt template for the HostAgent used for demonstration.String"ufo/prompts/examples/{mode}/host_agent_example.yaml"
APPAGENT_EXAMPLE_PROMPTThe example prompt template for the AppAgent used for demonstration.String"ufo/prompts/examples/{mode}/app_agent_example.yaml"
+

Lite versions of the example prompt templates can be found in the ufo/prompts/examples/lite/{mode} directory to reduce the input size for demonstration purposes.

+

Experience and Demonstration Learning

+

These configuration parameters are used for experience and demonstration learning in the UFO agent.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
EXPERIENCE_PROMPTThe prompt for self-experience learning.String"ufo/prompts/experience/experience_summary.yaml"
EXPERIENCE_SAVED_PATHThe path to save the experience learning data.String"vectordb/experience/"
DEMONSTRATION_PROMPTThe prompt for user demonstration learning.String"ufo/prompts/demonstration/demonstration_summary.yaml"
DEMONSTRATION_SAVED_PATHThe path to save the demonstration learning data.String"vectordb/demonstration/"
+

Application API Configuration

+

These prompt configuration parameters are used for the application and control APIs in the UFO agent.

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
API_PROMPTThe prompt for the UI automation API.String"ufo/prompts/share/base/api.yaml"
APP_API_PROMPT_ADDRESSThe prompt address for the application API.Dict{"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"}
+

pywinauto Configuration

+

The API configuration parameters are used for the pywinauto API in the UFO agent.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
CLICK_APIThe API used for click action, can be click_input or click.String"click_input"
INPUT_TEXT_APIThe API used for input text action, can be type_keys or set_text.String"type_keys"
INPUT_TEXT_ENTERWhether to press enter after typing the text.BooleanFalse
+

Control Filtering

+

The control filtering configuration parameters are used for control filtering in the agent's observation.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
CONTROL_FILTERThe control filter type, can be TEXT, SEMANTIC, or ICON.List[]
CONTROL_FILTER_TOP_K_PLANThe control filter effect on top k plans from the agent.Integer2
CONTROL_FILTER_TOP_K_SEMANTICThe control filter top k for semantic similarity.Integer15
CONTROL_FILTER_TOP_K_ICONThe control filter top k for icon similarity.Integer15
CONTROL_FILTER_MODEL_SEMANTIC_NAMEThe control filter model name for semantic similarity.String"all-MiniLM-L6-v2"
CONTROL_FILTER_MODEL_ICON_NAMEThe control filter model name for icon similarity.String"clip-ViT-B-32"
+

Customizations

+

The customization configuration parameters are used for customizations in the UFO agent.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
ASK_QUESTIONWhether to ask the user for a question.BooleanTrue
USE_CUSTOMIZATIONWhether to enable the customization.BooleanTrue
QA_PAIR_FILEThe path for the historical QA pairs.String"customization/historical_qa.txt"
QA_PAIR_NUMThe number of QA pairs for the customization.Integer20
+

Evaluation

+

The evaluation configuration parameters are used for the evaluation in the UFO agent.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
EVA_SESSIONWhether to include the session in the evaluation.BooleanTrue
EVA_ROUNDWhether to include the round in the evaluation.BooleanFalse
EVA_ALL_SCREENSHOTSWhether to include all the screenshots in the evaluation.BooleanTrue
+

You can customize the configuration parameters in the config_dev.yaml file to suit your development needs and enhance the functionality of the UFO agent.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/configurations/pricing_configuration/index.html b/configurations/pricing_configuration/index.html new file mode 100644 index 00000000..26ba832f --- /dev/null +++ b/configurations/pricing_configuration/index.html @@ -0,0 +1,356 @@ + + + + + + + + Model Pricing - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Pricing Configuration

+

We provide a configuration file pricing_config.yaml to calculate the pricing of the UFO agent using different LLM APIs. The pricing configuration file is located in the ufo/config directory. Note that the pricing configuration file is only used for reference and may not be up-to-date. Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.

+

You can also customize the pricing configuration file based on the configured model names and their respective input and output prices by adding or modifying the pricing information in the pricing_config.yaml file. Below is the default pricing configuration:

+
# Prices in $ per 1000 tokens
+# Last updated: 2024-05-13
+PRICES: { 
+    "openai/gpt-4-0613": {"input": 0.03, "output": 0.06},
+    "openai/gpt-3.5-turbo-0613": {"input": 0.0015, "output": 0.002},
+    "openai/gpt-4-0125-preview": {"input": 0.01, "output": 0.03},
+    "openai/gpt-4-1106-preview": {"input": 0.01, "output": 0.03},
+    "openai/gpt-4-1106-vision-preview": {"input": 0.01, "output": 0.03},
+    "openai/gpt-4": {"input": 0.03, "output": 0.06},
+    "openai/gpt-4-32k": {"input": 0.06, "output": 0.12},
+    "openai/gpt-4-turbo": {"input":0.01,"output": 0.03},
+    "openai/gpt-4o": {"input": 0.005,"output": 0.015},
+    "openai/gpt-4o-2024-05-13": {"input": 0.005, "output": 0.015},
+    "openai/gpt-3.5-turbo-0125": {"input": 0.0005, "output": 0.0015},
+    "openai/gpt-3.5-turbo-1106": {"input": 0.001, "output": 0.002},
+    "openai/gpt-3.5-turbo-instruct": {"input": 0.0015, "output": 0.002},
+    "openai/gpt-3.5-turbo-16k-0613": {"input": 0.003, "output": 0.004},
+    "openai/whisper-1": {"input": 0.006, "output": 0.006},
+    "openai/tts-1": {"input": 0.015, "output": 0.015},
+    "openai/tts-hd-1": {"input": 0.03, "output": 0.03},
+    "openai/text-embedding-ada-002-v2": {"input": 0.0001, "output": 0.0001},
+    "openai/text-davinci:003": {"input": 0.02, "output": 0.02},
+    "openai/text-ada-001": {"input": 0.0004, "output": 0.0004},
+    "azure/gpt-35-turbo-20220309":{"input": 0.0015, "output": 0.002},
+    "azure/gpt-35-turbo-20230613":{"input": 0.0015, "output": 0.002},
+    "azure/gpt-35-turbo-16k-20230613":{"input": 0.003, "output": 0.004},
+    "azure/gpt-35-turbo-1106":{"input": 0.001, "output": 0.002},
+    "azure/gpt-4-20230321":{"input": 0.03, "output": 0.06},
+    "azure/gpt-4-32k-20230321":{"input": 0.06, "output": 0.12},
+    "azure/gpt-4-1106-preview": {"input": 0.01, "output": 0.03},
+    "azure/gpt-4-0125-preview": {"input": 0.01, "output": 0.03},
+    "azure/gpt-4-visual-preview": {"input": 0.01, "output": 0.03},
+    "azure/gpt-4-turbo-20240409": {"input":0.01,"output": 0.03},
+    "azure/gpt-4o": {"input": 0.005,"output": 0.015},
+    "azure/gpt-4o-20240513": {"input": 0.005, "output": 0.015},
+    "qwen/qwen-vl-plus": {"input": 0.008, "output": 0.008},
+    "qwen/qwen-vl-max": {"input": 0.02, "output": 0.02},
+    "gemini/gemini-1.5-flash": {"input": 0.00035, "output": 0.00105},
+    "gemini/gemini-1.5-pro": {"input": 0.0035, "output": 0.0105},
+    "gemini/gemini-1.0-pro": {"input": 0.0005, "output": 0.0015},
+}
+
+

Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/configurations/user_configuration/index.html b/configurations/user_configuration/index.html new file mode 100644 index 00000000..bd37ca6c --- /dev/null +++ b/configurations/user_configuration/index.html @@ -0,0 +1,573 @@ + + + + + + + + User Configuration - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

User Configuration

+

An overview of the user configuration options available in UFO. You need to rename the config.yaml.template in the folder ufo/config to config.yaml to configure the LLMs and other custom settings.

+

LLM Configuration

+

You can configure the LLMs for the HOST_AGENT and APP_AGENT separately in the config.yaml file. The FollowerAgent and EvaluationAgent share the same LLM configuration as the APP_AGENT. Additionally, you can configure a backup LLM engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference.

+

Below are the configuration options for the LLMs, using OpenAI and Azure OpenAI (AOAI) as examples. You can find the settings for other LLM API configurations and usage in the Supported Models section of the documentation.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
VISUAL_MODEWhether to use visual mode to understand screenshots and take actionsBooleanTrue
API_TYPEThe API type: "openai" for the OpenAI API, "aoai" for the AOAI API.String"openai"
API_BASEThe API endpoint for the LLMString"https://api.openai.com/v1/chat/completions"
API_KEYThe API key for the LLMString"sk-"
API_VERSIONThe version of the APIString"2024-02-15-preview"
API_MODELThe LLM model nameString"gpt-4-vision-preview"
+

For Azure OpenAI (AOAI) API

+

The following additional configuration option is available for the AOAI API:

+ + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
API_DEPLOYMENT_IDThe deployment ID, only available for the AOAI APIString""
+

Ensure to fill in the necessary API details for both the HOST_AGENT and APP_AGENT to enable UFO to interact with the LLMs effectively.

+

LLM Parameters

+

You can also configure additional parameters for the LLMs in the config.yaml file:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
MAX_TOKENSThe maximum token limit for the response completionInteger2000
MAX_RETRYThe maximum retry limit for the response completionInteger3
TEMPERATUREThe temperature of the model: the lower the value, the more consistent the output of the modelFloat0.0
TOP_PThe top_p of the model: the lower the value, the more conservative the output of the modelFloat0.0
TIMEOUTThe call timeout in secondsInteger60
+

For RAG Configuration to Enhance the UFO Agent

+

You can configure the RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources:

+

RAG Configuration for the Offline Docs

+

Configure the following parameters to allow UFO to use offline documents for the decision-making process:

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_OFFLINE_DOCSWhether to use the offline RAGBooleanFalse
RAG_OFFLINE_DOCS_RETRIEVED_TOPKThe topk for the offline retrieved documentsInteger1
+ +

Configure the following parameters to allow UFO to use online Bing search for the decision-making process:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_ONLINE_SEARCHWhether to use the Bing searchBooleanFalse
BING_API_KEYThe Bing search API keyString""
RAG_ONLINE_SEARCH_TOPKThe topk for the online searchInteger5
RAG_ONLINE_RETRIEVED_TOPKThe topk for the online retrieved searched resultsInteger1
+

RAG Configuration for experience

+

Configure the following parameters to allow UFO to use the RAG from its self-experience:

+ + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_EXPERIENCEWhether to use the RAG from its self-experienceBooleanFalse
RAG_EXPERIENCE_RETRIEVED_TOPKThe topk for the offline retrieved documentsInteger5
+

RAG Configuration for demonstration

+

Configure the following parameters to allow UFO to use the RAG from user demonstration:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration OptionDescriptionTypeDefault Value
RAG_DEMONSTRATIONWhether to use the RAG from its user demonstrationBooleanFalse
RAG_DEMONSTRATION_RETRIEVED_TOPKThe topk for the offline retrieved documentsInteger5
RAG_DEMONSTRATION_COMPLETION_NThe number of completion choices for the demonstration resultInteger3
+

Explore the various RAG configurations to enhance the UFO agent with additional knowledge sources and improve its decision-making capabilities.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/creating_app_agent/demonstration_provision/index.html b/creating_app_agent/demonstration_provision/index.html new file mode 100644 index 00000000..b4fe15c0 --- /dev/null +++ b/creating_app_agent/demonstration_provision/index.html @@ -0,0 +1,370 @@ + + + + + + + + Demonstration Provision - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Provide Human Demonstrations to the AppAgent

+

Users or application developers can provide human demonstrations to the AppAgent to guide it in executing similar tasks in the future. The AppAgent uses these demonstrations to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.

+

How to Prepare Human Demonstrations for the AppAgent?

+

Currently, UFO supports learning from user trajectories recorded by Steps Recorder integrated within Windows. More tools will be supported in the future.

+

Step 1: Recording User Demonstrations

+

Follow the official guidance to use Steps Recorder to record user demonstrations.

+

Step 2: Add Additional Information or Comments as Needed

+

Include any specific details or instructions for UFO to notice by adding comments. Since Steps Recorder doesn't capture typed text, include any necessary typed content in the comments as well.

+

+ Adding Comments in Steps Recorder +

+ +

Step 3: Review and Save the Recorded Demonstrations

+

Review the recorded steps and save them to a ZIP file. Refer to the sample_record.zip for an example of recorded steps for a specific request, such as "sending an email to example@gmail.com to say hi."

+

Step 4: Create an Action Trajectory Indexer

+

Once you have your demonstration record ZIP file ready, you can parse it as an example to support RAG for UFO. Follow these steps:

+
# Assume you are in the cloned UFO folder
+python -m record_processor -r "<your request for the demonstration>" -p "<record ZIP file path>"
+
+
    +
  • Replace <your request for the demonstration> with the specific request, such as "sending an email to example@gmail.com to say hi."
  • +
  • Replace <record ZIP file path> with the full path to the ZIP file you just created.
  • +
+

This command will parse the record and summarize it into an execution plan. You'll see a confirmation message similar to the following:

+
Here are the plans summarized from your demonstration:
+Plan [1]
+(1) Input the email address 'example@gmail.com' in the 'To' field.
+(2) Input the subject of the email. I need to input 'Greetings'.
+(3) Input the content of the email. I need to input 'Hello,\nI hope this message finds you well. I am writing to send you a warm greeting and to wish you a great day.\nBest regards.'
+(4) Click the Send button to send the email.
+Plan [2]
+(1) ***
+(2) ***
+(3) ***
+Plan [3]
+(1) ***
+(2) ***
+(3) ***
+Would you like to save any one of them as a future reference for the agent? Press [1] [2] [3] to save the corresponding plan, or press any other key to skip.
+
+

Press 1 to save the plan into its memory for future reference. A sample can be found here.

+

You can view a demonstration video below:

+ + +


+

How to Use Human Demonstrations to Enhance the AppAgent?

+

After creating the offline indexer, refer to the Learning from User Demonstrations section for guidance on how to use human demonstrations to enhance the AppAgent.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/creating_app_agent/help_document_provision/index.html b/creating_app_agent/help_document_provision/index.html new file mode 100644 index 00000000..d6d83555 --- /dev/null +++ b/creating_app_agent/help_document_provision/index.html @@ -0,0 +1,351 @@ + + + + + + + + Help Document Provision - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Providing Help Documents to the AppAgent

+

Help documents provide guidance to the AppAgent in executing specific tasks. The AppAgent uses these documents to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.

+

How to Provide Help Documents to the AppAgent?

+

Step 1: Prepare Help Documents and Metadata

+

Currently, UFO supports processing help documents in XML format, which is the default format for official help documents of Microsoft apps. More formats will be supported in the future.

+

To create a dedicated document for a specific task of an app, save it in a file named, for example, task.xml. This document should be accompanied by a metadata file with the same prefix but with the .meta extension, such as task.xml.meta. The metadata file should include:

+
    +
  • title: Describes the task at a high level.
  • +
  • Content-Summary: Summarizes the content of the help document.
  • +
+

These two files are used for similarity search with user requests, so it is important to write them carefully. Examples of a help document and its metadata can be found here and here.

+

Step 2: Place Help Documents in the AppAgent Directory

+

Once you have prepared all help documents and their metadata, place them into a folder. Sub-folders for the help documents are allowed, but ensure that each help document and its corresponding metadata are placed in the same directory.

+

Step 3: Create a Help Document Indexer

+

After organizing your documents in a folder named path_of_the_docs, you can create an offline indexer to support RAG for UFO. Follow these steps:

+
# Assume you are in the cloned UFO folder
+python -m learner --app <app_name> --docs <path_of_the_docs>
+
+
    +
  • Replace <app_name> with the name of the application, such as PowerPoint or WeChat.
  • +
  • Replace <path_of_the_docs> with the full path to the folder containing all your documents.
  • +
+

This command will create an offline indexer for all documents in the path_of_the_docs folder using Faiss and embedding with sentence transformer (additional embeddings will be supported soon). By default, the created index will be placed here.

+
+

Note

+

Ensure the app_name is accurately defined, as it is used to match the offline indexer in online RAG.

+
+

How to Use Help Documents to Enhance the AppAgent?

+

After creating the offline indexer, you can find the guidance on how to use the help documents to enhance the AppAgent in the Learning from Help Documents section.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/creating_app_agent/overview/index.html b/creating_app_agent/overview/index.html new file mode 100644 index 00000000..c9bb9e92 --- /dev/null +++ b/creating_app_agent/overview/index.html @@ -0,0 +1,339 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Creating Your AppAgent

+

UFO provides a flexible framework and SDK for application developers to empower their applications with AI capabilities by wrapping them into an AppAgent. By creating an AppAgent, you can leverage the power of UFO to interact with your application and automate tasks.

+

To create an AppAgent, you can provide the following components:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
ComponentDescriptionUsage Documentation
Help DocumentsThe help documents for the application to guide the AppAgent in executing tasks.Learning from Help Documents
User DemonstrationsThe user demonstrations for the application to guide the AppAgent in executing tasks.Learning from User Demonstrations
Native API WrappersThe native API wrappers for the application to interact with the application.Automator
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/creating_app_agent/warpping_app_native_api/index.html b/creating_app_agent/warpping_app_native_api/index.html new file mode 100644 index 00000000..241140b1 --- /dev/null +++ b/creating_app_agent/warpping_app_native_api/index.html @@ -0,0 +1,568 @@ + + + + + + + + Warpping App-Native API - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Wrapping Your App's Native API

+

UFO takes actions on applications based on UI controls, but providing native API to its toolboxes can enhance the efficiency and accuracy of the actions. This document provides guidance on how to wrap your application's native API into UFO's toolboxes.

+

How to Wrap Your App's Native API?

+

Before developing the native API wrappers, we strongly recommend that you read the design of the Automator.

+

Step 1: Create a Receiver for the Native API

+

The Receiver is a class that receives the native API calls from the AppAgent and executes them. To wrap your application's native API, you need to create a Receiver class that contains the methods to execute the native API calls.

+

To create a Receiver class, follow these steps:

+

1. Create a Folder for Your Application

+
    +
  • Navigate to the ufo/automator/app_api/ directory.
  • +
  • Create a folder named after your application.
  • +
+

2. Create a Python File

+
    +
  • Inside the folder you just created, add a Python file named after your application, for example, {your_application}_client.py.
  • +
+

3. Define the Receiver Class

+
    +
  • In the Python file, define a class named {Your_Receiver}, inheriting from the ReceiverBasic class located in ufo/automator/basic.py.
  • +
  • Initialize the Your_Receiver class with the object that executes the native API calls. For example, if your API is based on a com object, initialize the com object in the __init__ method of the Your_Receiver class.
  • +
+

Example of WinCOMReceiverBasic class:

+
class WinCOMReceiverBasic(ReceiverBasic):
+    """
+    The base class for Windows COM client.
+    """
+
+    _command_registry: Dict[str, Type[CommandBasic]] = {}
+
+    def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None:
+        """
+        Initialize the Windows COM client.
+        :param app_root_name: The app root name.
+        :param process_name: The process name.
+        :param clsid: The CLSID of the COM object.
+        """
+
+        self.app_root_name = app_root_name
+        self.process_name = process_name
+        self.clsid = clsid
+        self.client = win32com.client.Dispatch(self.clsid)
+        self.com_object = self.get_object_from_process_name()
+
+
+

4. Define Methods to Execute Native API Calls

+
    +
  • Define the methods in the Your_Receiver class to execute the native API calls.
  • +
+

Example of ExcelWinCOMReceiver class:

+
def table2markdown(self, sheet_name: str) -> str:
+    """
+    Convert the table in the sheet to a markdown table string.
+    :param sheet_name: The sheet name.
+    :return: The markdown table string.
+    """
+
+    sheet = self.com_object.Sheets(sheet_name)
+    data = sheet.UsedRange()
+    df = pd.DataFrame(data[1:], columns=data[0])
+    df = df.dropna(axis=0, how="all")
+    df = df.applymap(self.format_value)
+
+    return df.to_markdown(index=False)
+
+
+

5. Create a Factory Class

+
    +
  • Create your Factory class inheriting from the APIReceiverFactory class to manage multiple Receiver classes that share the same API type.
  • +
  • Implement the create_receiver and name methods in the ReceiverFactory class. The create_receiver method should return the Receiver class.
  • +
  • By default, the create_receiver takes the app_root_name and process_name as parameters and returns the Receiver class.
  • +
  • Register the ReceiverFactory class with the decorator @ReceiverManager.register.
  • +
+

Example of the COMReceiverFactory class:

+
from ufo.automator.puppeteer import ReceiverManager
+
+@ReceiverManager.register
+class COMReceiverFactory(APIReceiverFactory):
+    """
+    The factory class for the COM receiver.
+    """
+
+    def create_receiver(self, app_root_name: str, process_name: str) -> WinCOMReceiverBasic:
+        """
+        Create the wincom receiver.
+        :param app_root_name: The app root name.
+        :param process_name: The process name.
+        :return: The receiver.
+        """
+
+        com_receiver = self.__com_client_mapper(app_root_name)
+        clsid = self.__app_root_mappping(app_root_name)
+
+        if clsid is None or com_receiver is None:
+            # print_with_color(f"Warning: Win32COM API is not supported for {process_name}.", "yellow")
+            return None
+
+        return com_receiver(app_root_name, process_name, clsid)
+
+    @classmethod
+    def name(cls) -> str:
+        """
+        Get the name of the receiver factory.
+        :return: The name of the receiver factory.
+        """
+        return "COM"
+
+
+

Note

+

The create_receiver method should return None if the application is not supported.

+
+
+

Note

+

You must register your ReceiverFactory with the decorator @ReceiverManager.register for the ReceiverManager to manage the ReceiverFactory.

+
+

The Receiver class is now ready to receive the native API calls from the AppAgent.

+

Step 2: Create a Command for the Native API

+

Commands are the actions that the AppAgent can execute on the application. To create a command for the native API, you need to create a Command class that contains the method to execute the native API calls.

+

1. Create a Command Class

+
    +
  • Create a Command class in the same Python file where the Receiver class is located. The Command class should inherit from the CommandBasic class located in ufo/automator/basic.py.
  • +
+

Example:

+
class WinCOMCommand(CommandBasic):
+    """
+    The abstract command interface.
+    """
+
+    def __init__(self, receiver: WinCOMReceiverBasic, params=None) -> None:
+        """
+        Initialize the command.
+        :param receiver: The receiver of the command.
+        """
+        self.receiver = receiver
+        self.params = params if params is not None else {}
+
+    @abstractmethod
+    def execute(self):
+        pass
+
+    @classmethod
+    def name(cls) -> str:
+        """
+        Get the name of the command.
+        :return: The name of the command.
+        """
+        return cls.__name__
+
+
+

2. Define the Execute Method

+
    +
  • Define the execute method in the Command class to call the receiver to execute the native API calls.
  • +
+

Example:

+
def execute(self):
+    """
+    Execute the command to insert a table.
+    :return: The inserted table.
+    """
+    return self.receiver.insert_excel_table(
+        sheet_name=self.params.get("sheet_name", 1),
+        table=self.params.get("table"),
+        start_row=self.params.get("start_row", 1),
+        start_col=self.params.get("start_col", 1),
+    )
+
+

3. Register the Command Class:

+
    +
  • Register the Command class in the corresponding Receiver class using the @your_receiver.register decorator.
  • +
+

Example:

+
@ExcelWinCOMReceiver.register
+class InsertExcelTable(WinCOMCommand):
+    ...
+
+

The Command class is now registered in the Receiver class and available for the AppAgent to execute the native API calls.

+

Step 3: Provide Prompt Descriptions for the Native API

+

To let the AppAgent know the usage of the native API calls, you need to provide prompt descriptions.

+

1. Create an api.yaml File

+
- Create an `api.yaml` file in the `ufo/prompts/apps/{your_app_name}` directory.
+
+

2. Define Prompt Descriptions

+
    +
  • Define the prompt descriptions for the native API calls in the api.yaml file.
  • +
+

Example:

+
table2markdown:
+summary: |-
+    "table2markdown" is to get the table content in a sheet of the Excel app and convert it to markdown format.
+class_name: |-
+    GetSheetContent
+usage: |-
+    [1] API call: table2markdown(sheet_name: str)
+    [2] Args:
+    - sheet_name: The name of the sheet in the Excel app.
+    [3] Example: table2markdown(sheet_name="Sheet1")
+    [4] Available control item: Any control item in the Excel app.
+    [5] Return: the markdown format string of the table content of the sheet.
+
+
+

Note

+

The table2markdown is the name of the native API call. It MUST match the name() defined in the corresponding Command class!

+
+

3. Register the Prompt Address in config_dev.yaml

+
    +
  • Register the prompt address by adding to the APP_API_PROMPT_ADDRESS field of config_dev.yaml file with the application program name as the key and the prompt file address as the value.
  • +
+

Example:

+
APP_API_PROMPT_ADDRESS: {
+    "WINWORD.EXE": "ufo/prompts/apps/word/api.yaml",
+    "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml",
+    "msedge.exe": "ufo/prompts/apps/web/api.yaml",
+    "chrome.exe": "ufo/prompts/apps/web/api.yaml"
+    "your_application_program_name": "YOUR_APPLICATION_API_PROMPT"
+} 
+
+
+

Note

+

The your_application_program_name must match the name of the application program.

+
+

The AppAgent can now use the prompt descriptions to understand the usage of the native API calls.

+
+

By following these steps, you will have successfully wrapped the native API of your application into UFO's toolboxes, allowing the AppAgent to execute the native API calls on the application!

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/css/fonts/Roboto-Slab-Bold.woff b/css/fonts/Roboto-Slab-Bold.woff new file mode 100644 index 00000000..6cb60000 Binary files /dev/null and b/css/fonts/Roboto-Slab-Bold.woff differ diff --git a/css/fonts/Roboto-Slab-Bold.woff2 b/css/fonts/Roboto-Slab-Bold.woff2 new file mode 100644 index 00000000..7059e231 Binary files /dev/null and b/css/fonts/Roboto-Slab-Bold.woff2 differ diff --git a/css/fonts/Roboto-Slab-Regular.woff b/css/fonts/Roboto-Slab-Regular.woff new file mode 100644 index 00000000..f815f63f Binary files /dev/null and b/css/fonts/Roboto-Slab-Regular.woff differ diff --git a/css/fonts/Roboto-Slab-Regular.woff2 b/css/fonts/Roboto-Slab-Regular.woff2 new file mode 100644 index 00000000..f2c76e5b Binary files /dev/null and b/css/fonts/Roboto-Slab-Regular.woff2 differ diff --git a/css/fonts/fontawesome-webfont.eot b/css/fonts/fontawesome-webfont.eot new file mode 100644 index 00000000..e9f60ca9 Binary files /dev/null and b/css/fonts/fontawesome-webfont.eot differ diff --git a/css/fonts/fontawesome-webfont.svg b/css/fonts/fontawesome-webfont.svg new file mode 100644 index 00000000..855c845e --- /dev/null +++ b/css/fonts/fontawesome-webfont.svg @@ -0,0 +1,2671 @@ + + + + +Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016 + By ,,, +Copyright Dave Gandy 2016. All rights reserved. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/css/fonts/fontawesome-webfont.ttf b/css/fonts/fontawesome-webfont.ttf new file mode 100644 index 00000000..35acda2f Binary files /dev/null and b/css/fonts/fontawesome-webfont.ttf differ diff --git a/css/fonts/fontawesome-webfont.woff b/css/fonts/fontawesome-webfont.woff new file mode 100644 index 00000000..400014a4 Binary files /dev/null and b/css/fonts/fontawesome-webfont.woff differ diff --git a/css/fonts/fontawesome-webfont.woff2 b/css/fonts/fontawesome-webfont.woff2 new file mode 100644 index 00000000..4d13fc60 Binary files /dev/null and b/css/fonts/fontawesome-webfont.woff2 differ diff --git a/css/fonts/lato-bold-italic.woff b/css/fonts/lato-bold-italic.woff new file mode 100644 index 00000000..88ad05b9 Binary files /dev/null and b/css/fonts/lato-bold-italic.woff differ diff --git a/css/fonts/lato-bold-italic.woff2 b/css/fonts/lato-bold-italic.woff2 new file mode 100644 index 00000000..c4e3d804 Binary files /dev/null and b/css/fonts/lato-bold-italic.woff2 differ diff --git a/css/fonts/lato-bold.woff b/css/fonts/lato-bold.woff new file mode 100644 index 00000000..c6dff51f Binary files /dev/null and b/css/fonts/lato-bold.woff differ diff --git a/css/fonts/lato-bold.woff2 b/css/fonts/lato-bold.woff2 new file mode 100644 index 00000000..bb195043 Binary files /dev/null and b/css/fonts/lato-bold.woff2 differ diff --git a/css/fonts/lato-normal-italic.woff b/css/fonts/lato-normal-italic.woff new file mode 100644 index 00000000..76114bc0 Binary files /dev/null and b/css/fonts/lato-normal-italic.woff differ diff --git a/css/fonts/lato-normal-italic.woff2 b/css/fonts/lato-normal-italic.woff2 new file mode 100644 index 00000000..3404f37e Binary files /dev/null and b/css/fonts/lato-normal-italic.woff2 differ diff --git a/css/fonts/lato-normal.woff b/css/fonts/lato-normal.woff new file mode 100644 index 00000000..ae1307ff Binary files /dev/null and b/css/fonts/lato-normal.woff differ diff --git a/css/fonts/lato-normal.woff2 b/css/fonts/lato-normal.woff2 new file mode 100644 index 00000000..3bf98433 Binary files /dev/null and b/css/fonts/lato-normal.woff2 differ diff --git a/css/theme.css b/css/theme.css new file mode 100644 index 00000000..ad773009 --- /dev/null +++ b/css/theme.css @@ -0,0 +1,13 @@ +/* + * This file is copied from the upstream ReadTheDocs Sphinx + * theme. To aid upgradability this file should *not* be edited. + * modifications we need should be included in theme_extra.css. + * + * https://github.com/readthedocs/sphinx_rtd_theme + */ + + /* sphinx_rtd_theme version 1.2.0 | MIT license */ +html{box-sizing:border-box}*,:after,:before{box-sizing:inherit}article,aside,details,figcaption,figure,footer,header,hgroup,nav,section{display:block}audio,canvas,video{display:inline-block;*display:inline;*zoom:1}[hidden],audio:not([controls]){display:none}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}html{font-size:100%;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%}body{margin:0}a:active,a:hover{outline:0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}blockquote{margin:0}dfn{font-style:italic}ins{background:#ff9;text-decoration:none}ins,mark{color:#000}mark{background:#ff0;font-style:italic;font-weight:700}.rst-content code,.rst-content tt,code,kbd,pre,samp{font-family:monospace,serif;_font-family:courier new,monospace;font-size:1em}pre{white-space:pre}q{quotes:none}q:after,q:before{content:"";content:none}small{font-size:85%}sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}dl,ol,ul{margin:0;padding:0;list-style:none;list-style-image:none}li{list-style:none}dd{margin:0}img{border:0;-ms-interpolation-mode:bicubic;vertical-align:middle;max-width:100%}svg:not(:root){overflow:hidden}figure,form{margin:0}label{cursor:pointer}button,input,select,textarea{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle}button,input{line-height:normal}button,input[type=button],input[type=reset],input[type=submit]{cursor:pointer;-webkit-appearance:button;*overflow:visible}button[disabled],input[disabled]{cursor:default}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}textarea{resize:vertical}table{border-collapse:collapse;border-spacing:0}td{vertical-align:top}.chromeframe{margin:.2em 0;background:#ccc;color:#000;padding:.2em 0}.ir{display:block;border:0;text-indent:-999em;overflow:hidden;background-color:transparent;background-repeat:no-repeat;text-align:left;direction:ltr;*line-height:0}.ir br{display:none}.hidden{display:none!important;visibility:hidden}.visuallyhidden{border:0;clip:rect(0 0 0 0);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}.visuallyhidden.focusable:active,.visuallyhidden.focusable:focus{clip:auto;height:auto;margin:0;overflow:visible;position:static;width:auto}.invisible{visibility:hidden}.relative{position:relative}big,small{font-size:100%}@media print{body,html,section{background:none!important}*{box-shadow:none!important;text-shadow:none!important;filter:none!important;-ms-filter:none!important}a,a:visited{text-decoration:underline}.ir a:after,a[href^="#"]:after,a[href^="javascript:"]:after{content:""}blockquote,pre{page-break-inside:avoid}thead{display:table-header-group}img,tr{page-break-inside:avoid}img{max-width:100%!important}@page{margin:.5cm}.rst-content .toctree-wrapper>p.caption,h2,h3,p{orphans:3;widows:3}.rst-content .toctree-wrapper>p.caption,h2,h3{page-break-after:avoid}}.btn,.fa:before,.icon:before,.rst-content .admonition,.rst-content .admonition-title:before,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .code-block-caption .headerlink:before,.rst-content .danger,.rst-content .eqno .headerlink:before,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning,.rst-content code.download span:first-child:before,.rst-content dl dt .headerlink:before,.rst-content h1 .headerlink:before,.rst-content h2 .headerlink:before,.rst-content h3 .headerlink:before,.rst-content h4 .headerlink:before,.rst-content h5 .headerlink:before,.rst-content h6 .headerlink:before,.rst-content p.caption .headerlink:before,.rst-content p .headerlink:before,.rst-content table>caption .headerlink:before,.rst-content tt.download span:first-child:before,.wy-alert,.wy-dropdown .caret:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before,.wy-menu-vertical li button.toctree-expand:before,input[type=color],input[type=date],input[type=datetime-local],input[type=datetime],input[type=email],input[type=month],input[type=number],input[type=password],input[type=search],input[type=tel],input[type=text],input[type=time],input[type=url],input[type=week],select,textarea{-webkit-font-smoothing:antialiased}.clearfix{*zoom:1}.clearfix:after,.clearfix:before{display:table;content:""}.clearfix:after{clear:both}/*! + * Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome + * License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License) + */@font-face{font-family:FontAwesome;src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713);src:url(fonts/fontawesome-webfont.eot?674f50d287a8c48dc19ba404d20fe713?#iefix&v=4.7.0) format("embedded-opentype"),url(fonts/fontawesome-webfont.woff2?af7ae505a9eed503f8b8e6982036873e) format("woff2"),url(fonts/fontawesome-webfont.woff?fee66e712a8a08eef5805a46892932ad) format("woff"),url(fonts/fontawesome-webfont.ttf?b06871f281fee6b241d60582ae9369b9) format("truetype"),url(fonts/fontawesome-webfont.svg?912ec66d7572ff821749319396470bde#fontawesomeregular) format("svg");font-weight:400;font-style:normal}.fa,.icon,.rst-content .admonition-title,.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content code.download span:first-child,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink,.rst-content tt.download span:first-child,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li button.toctree-expand{display:inline-block;font:normal normal normal 14px/1 FontAwesome;font-size:inherit;text-rendering:auto;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.fa-lg{font-size:1.33333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14286em;list-style-type:none}.fa-ul>li{position:relative}.fa-li{position:absolute;left:-2.14286em;width:2.14286em;top:.14286em;text-align:center}.fa-li.fa-lg{left:-1.85714em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.fa-pull-left{float:left}.fa-pull-right{float:right}.fa-pull-left.icon,.fa.fa-pull-left,.rst-content .code-block-caption .fa-pull-left.headerlink,.rst-content .eqno .fa-pull-left.headerlink,.rst-content .fa-pull-left.admonition-title,.rst-content code.download span.fa-pull-left:first-child,.rst-content dl dt .fa-pull-left.headerlink,.rst-content h1 .fa-pull-left.headerlink,.rst-content h2 .fa-pull-left.headerlink,.rst-content h3 .fa-pull-left.headerlink,.rst-content h4 .fa-pull-left.headerlink,.rst-content h5 .fa-pull-left.headerlink,.rst-content h6 .fa-pull-left.headerlink,.rst-content p .fa-pull-left.headerlink,.rst-content table>caption .fa-pull-left.headerlink,.rst-content tt.download span.fa-pull-left:first-child,.wy-menu-vertical li.current>a button.fa-pull-left.toctree-expand,.wy-menu-vertical li.on a button.fa-pull-left.toctree-expand,.wy-menu-vertical li button.fa-pull-left.toctree-expand{margin-right:.3em}.fa-pull-right.icon,.fa.fa-pull-right,.rst-content .code-block-caption .fa-pull-right.headerlink,.rst-content .eqno .fa-pull-right.headerlink,.rst-content .fa-pull-right.admonition-title,.rst-content code.download span.fa-pull-right:first-child,.rst-content dl dt .fa-pull-right.headerlink,.rst-content h1 .fa-pull-right.headerlink,.rst-content h2 .fa-pull-right.headerlink,.rst-content h3 .fa-pull-right.headerlink,.rst-content h4 .fa-pull-right.headerlink,.rst-content h5 .fa-pull-right.headerlink,.rst-content h6 .fa-pull-right.headerlink,.rst-content p .fa-pull-right.headerlink,.rst-content table>caption .fa-pull-right.headerlink,.rst-content tt.download span.fa-pull-right:first-child,.wy-menu-vertical li.current>a button.fa-pull-right.toctree-expand,.wy-menu-vertical li.on a button.fa-pull-right.toctree-expand,.wy-menu-vertical li button.fa-pull-right.toctree-expand{margin-left:.3em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left,.pull-left.icon,.rst-content .code-block-caption .pull-left.headerlink,.rst-content .eqno .pull-left.headerlink,.rst-content .pull-left.admonition-title,.rst-content code.download span.pull-left:first-child,.rst-content dl dt .pull-left.headerlink,.rst-content h1 .pull-left.headerlink,.rst-content h2 .pull-left.headerlink,.rst-content h3 .pull-left.headerlink,.rst-content h4 .pull-left.headerlink,.rst-content h5 .pull-left.headerlink,.rst-content h6 .pull-left.headerlink,.rst-content p .pull-left.headerlink,.rst-content table>caption .pull-left.headerlink,.rst-content tt.download span.pull-left:first-child,.wy-menu-vertical li.current>a button.pull-left.toctree-expand,.wy-menu-vertical li.on a button.pull-left.toctree-expand,.wy-menu-vertical li button.pull-left.toctree-expand{margin-right:.3em}.fa.pull-right,.pull-right.icon,.rst-content .code-block-caption .pull-right.headerlink,.rst-content .eqno .pull-right.headerlink,.rst-content .pull-right.admonition-title,.rst-content code.download span.pull-right:first-child,.rst-content dl dt .pull-right.headerlink,.rst-content h1 .pull-right.headerlink,.rst-content h2 .pull-right.headerlink,.rst-content h3 .pull-right.headerlink,.rst-content h4 .pull-right.headerlink,.rst-content h5 .pull-right.headerlink,.rst-content h6 .pull-right.headerlink,.rst-content p .pull-right.headerlink,.rst-content table>caption .pull-right.headerlink,.rst-content tt.download span.pull-right:first-child,.wy-menu-vertical li.current>a button.pull-right.toctree-expand,.wy-menu-vertical li.on a button.pull-right.toctree-expand,.wy-menu-vertical li button.pull-right.toctree-expand{margin-left:.3em}.fa-spin{-webkit-animation:fa-spin 2s linear infinite;animation:fa-spin 2s linear infinite}.fa-pulse{-webkit-animation:fa-spin 1s steps(8) infinite;animation:fa-spin 1s steps(8) infinite}@-webkit-keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}to{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}@keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}to{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";-webkit-transform:rotate(90deg);-ms-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";-webkit-transform:rotate(180deg);-ms-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";-webkit-transform:rotate(270deg);-ms-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1)";-webkit-transform:scaleX(-1);-ms-transform:scaleX(-1);transform:scaleX(-1)}.fa-flip-vertical{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1)";-webkit-transform:scaleY(-1);-ms-transform:scaleY(-1);transform:scaleY(-1)}:root .fa-flip-horizontal,:root .fa-flip-vertical,:root .fa-rotate-90,:root .fa-rotate-180,:root .fa-rotate-270{filter:none}.fa-stack{position:relative;display:inline-block;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:""}.fa-music:before{content:""}.fa-search:before,.icon-search:before{content:""}.fa-envelope-o:before{content:""}.fa-heart:before{content:""}.fa-star:before{content:""}.fa-star-o:before{content:""}.fa-user:before{content:""}.fa-film:before{content:""}.fa-th-large:before{content:""}.fa-th:before{content:""}.fa-th-list:before{content:""}.fa-check:before{content:""}.fa-close:before,.fa-remove:before,.fa-times:before{content:""}.fa-search-plus:before{content:""}.fa-search-minus:before{content:""}.fa-power-off:before{content:""}.fa-signal:before{content:""}.fa-cog:before,.fa-gear:before{content:""}.fa-trash-o:before{content:""}.fa-home:before,.icon-home:before{content:""}.fa-file-o:before{content:""}.fa-clock-o:before{content:""}.fa-road:before{content:""}.fa-download:before,.rst-content code.download span:first-child:before,.rst-content tt.download span:first-child:before{content:""}.fa-arrow-circle-o-down:before{content:""}.fa-arrow-circle-o-up:before{content:""}.fa-inbox:before{content:""}.fa-play-circle-o:before{content:""}.fa-repeat:before,.fa-rotate-right:before{content:""}.fa-refresh:before{content:""}.fa-list-alt:before{content:""}.fa-lock:before{content:""}.fa-flag:before{content:""}.fa-headphones:before{content:""}.fa-volume-off:before{content:""}.fa-volume-down:before{content:""}.fa-volume-up:before{content:""}.fa-qrcode:before{content:""}.fa-barcode:before{content:""}.fa-tag:before{content:""}.fa-tags:before{content:""}.fa-book:before,.icon-book:before{content:""}.fa-bookmark:before{content:""}.fa-print:before{content:""}.fa-camera:before{content:""}.fa-font:before{content:""}.fa-bold:before{content:""}.fa-italic:before{content:""}.fa-text-height:before{content:""}.fa-text-width:before{content:""}.fa-align-left:before{content:""}.fa-align-center:before{content:""}.fa-align-right:before{content:""}.fa-align-justify:before{content:""}.fa-list:before{content:""}.fa-dedent:before,.fa-outdent:before{content:""}.fa-indent:before{content:""}.fa-video-camera:before{content:""}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:""}.fa-pencil:before{content:""}.fa-map-marker:before{content:""}.fa-adjust:before{content:""}.fa-tint:before{content:""}.fa-edit:before,.fa-pencil-square-o:before{content:""}.fa-share-square-o:before{content:""}.fa-check-square-o:before{content:""}.fa-arrows:before{content:""}.fa-step-backward:before{content:""}.fa-fast-backward:before{content:""}.fa-backward:before{content:""}.fa-play:before{content:""}.fa-pause:before{content:""}.fa-stop:before{content:""}.fa-forward:before{content:""}.fa-fast-forward:before{content:""}.fa-step-forward:before{content:""}.fa-eject:before{content:""}.fa-chevron-left:before{content:""}.fa-chevron-right:before{content:""}.fa-plus-circle:before{content:""}.fa-minus-circle:before{content:""}.fa-times-circle:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before{content:""}.fa-check-circle:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before{content:""}.fa-question-circle:before{content:""}.fa-info-circle:before{content:""}.fa-crosshairs:before{content:""}.fa-times-circle-o:before{content:""}.fa-check-circle-o:before{content:""}.fa-ban:before{content:""}.fa-arrow-left:before{content:""}.fa-arrow-right:before{content:""}.fa-arrow-up:before{content:""}.fa-arrow-down:before{content:""}.fa-mail-forward:before,.fa-share:before{content:""}.fa-expand:before{content:""}.fa-compress:before{content:""}.fa-plus:before{content:""}.fa-minus:before{content:""}.fa-asterisk:before{content:""}.fa-exclamation-circle:before,.rst-content .admonition-title:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before{content:""}.fa-gift:before{content:""}.fa-leaf:before{content:""}.fa-fire:before,.icon-fire:before{content:""}.fa-eye:before{content:""}.fa-eye-slash:before{content:""}.fa-exclamation-triangle:before,.fa-warning:before{content:""}.fa-plane:before{content:""}.fa-calendar:before{content:""}.fa-random:before{content:""}.fa-comment:before{content:""}.fa-magnet:before{content:""}.fa-chevron-up:before{content:""}.fa-chevron-down:before{content:""}.fa-retweet:before{content:""}.fa-shopping-cart:before{content:""}.fa-folder:before{content:""}.fa-folder-open:before{content:""}.fa-arrows-v:before{content:""}.fa-arrows-h:before{content:""}.fa-bar-chart-o:before,.fa-bar-chart:before{content:""}.fa-twitter-square:before{content:""}.fa-facebook-square:before{content:""}.fa-camera-retro:before{content:""}.fa-key:before{content:""}.fa-cogs:before,.fa-gears:before{content:""}.fa-comments:before{content:""}.fa-thumbs-o-up:before{content:""}.fa-thumbs-o-down:before{content:""}.fa-star-half:before{content:""}.fa-heart-o:before{content:""}.fa-sign-out:before{content:""}.fa-linkedin-square:before{content:""}.fa-thumb-tack:before{content:""}.fa-external-link:before{content:""}.fa-sign-in:before{content:""}.fa-trophy:before{content:""}.fa-github-square:before{content:""}.fa-upload:before{content:""}.fa-lemon-o:before{content:""}.fa-phone:before{content:""}.fa-square-o:before{content:""}.fa-bookmark-o:before{content:""}.fa-phone-square:before{content:""}.fa-twitter:before{content:""}.fa-facebook-f:before,.fa-facebook:before{content:""}.fa-github:before,.icon-github:before{content:""}.fa-unlock:before{content:""}.fa-credit-card:before{content:""}.fa-feed:before,.fa-rss:before{content:""}.fa-hdd-o:before{content:""}.fa-bullhorn:before{content:""}.fa-bell:before{content:""}.fa-certificate:before{content:""}.fa-hand-o-right:before{content:""}.fa-hand-o-left:before{content:""}.fa-hand-o-up:before{content:""}.fa-hand-o-down:before{content:""}.fa-arrow-circle-left:before,.icon-circle-arrow-left:before{content:""}.fa-arrow-circle-right:before,.icon-circle-arrow-right:before{content:""}.fa-arrow-circle-up:before{content:""}.fa-arrow-circle-down:before{content:""}.fa-globe:before{content:""}.fa-wrench:before{content:""}.fa-tasks:before{content:""}.fa-filter:before{content:""}.fa-briefcase:before{content:""}.fa-arrows-alt:before{content:""}.fa-group:before,.fa-users:before{content:""}.fa-chain:before,.fa-link:before,.icon-link:before{content:""}.fa-cloud:before{content:""}.fa-flask:before{content:""}.fa-cut:before,.fa-scissors:before{content:""}.fa-copy:before,.fa-files-o:before{content:""}.fa-paperclip:before{content:""}.fa-floppy-o:before,.fa-save:before{content:""}.fa-square:before{content:""}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:""}.fa-list-ul:before{content:""}.fa-list-ol:before{content:""}.fa-strikethrough:before{content:""}.fa-underline:before{content:""}.fa-table:before{content:""}.fa-magic:before{content:""}.fa-truck:before{content:""}.fa-pinterest:before{content:""}.fa-pinterest-square:before{content:""}.fa-google-plus-square:before{content:""}.fa-google-plus:before{content:""}.fa-money:before{content:""}.fa-caret-down:before,.icon-caret-down:before,.wy-dropdown .caret:before{content:""}.fa-caret-up:before{content:""}.fa-caret-left:before{content:""}.fa-caret-right:before{content:""}.fa-columns:before{content:""}.fa-sort:before,.fa-unsorted:before{content:""}.fa-sort-desc:before,.fa-sort-down:before{content:""}.fa-sort-asc:before,.fa-sort-up:before{content:""}.fa-envelope:before{content:""}.fa-linkedin:before{content:""}.fa-rotate-left:before,.fa-undo:before{content:""}.fa-gavel:before,.fa-legal:before{content:""}.fa-dashboard:before,.fa-tachometer:before{content:""}.fa-comment-o:before{content:""}.fa-comments-o:before{content:""}.fa-bolt:before,.fa-flash:before{content:""}.fa-sitemap:before{content:""}.fa-umbrella:before{content:""}.fa-clipboard:before,.fa-paste:before{content:""}.fa-lightbulb-o:before{content:""}.fa-exchange:before{content:""}.fa-cloud-download:before{content:""}.fa-cloud-upload:before{content:""}.fa-user-md:before{content:""}.fa-stethoscope:before{content:""}.fa-suitcase:before{content:""}.fa-bell-o:before{content:""}.fa-coffee:before{content:""}.fa-cutlery:before{content:""}.fa-file-text-o:before{content:""}.fa-building-o:before{content:""}.fa-hospital-o:before{content:""}.fa-ambulance:before{content:""}.fa-medkit:before{content:""}.fa-fighter-jet:before{content:""}.fa-beer:before{content:""}.fa-h-square:before{content:""}.fa-plus-square:before{content:""}.fa-angle-double-left:before{content:""}.fa-angle-double-right:before{content:""}.fa-angle-double-up:before{content:""}.fa-angle-double-down:before{content:""}.fa-angle-left:before{content:""}.fa-angle-right:before{content:""}.fa-angle-up:before{content:""}.fa-angle-down:before{content:""}.fa-desktop:before{content:""}.fa-laptop:before{content:""}.fa-tablet:before{content:""}.fa-mobile-phone:before,.fa-mobile:before{content:""}.fa-circle-o:before{content:""}.fa-quote-left:before{content:""}.fa-quote-right:before{content:""}.fa-spinner:before{content:""}.fa-circle:before{content:""}.fa-mail-reply:before,.fa-reply:before{content:""}.fa-github-alt:before{content:""}.fa-folder-o:before{content:""}.fa-folder-open-o:before{content:""}.fa-smile-o:before{content:""}.fa-frown-o:before{content:""}.fa-meh-o:before{content:""}.fa-gamepad:before{content:""}.fa-keyboard-o:before{content:""}.fa-flag-o:before{content:""}.fa-flag-checkered:before{content:""}.fa-terminal:before{content:""}.fa-code:before{content:""}.fa-mail-reply-all:before,.fa-reply-all:before{content:""}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:""}.fa-location-arrow:before{content:""}.fa-crop:before{content:""}.fa-code-fork:before{content:""}.fa-chain-broken:before,.fa-unlink:before{content:""}.fa-question:before{content:""}.fa-info:before{content:""}.fa-exclamation:before{content:""}.fa-superscript:before{content:""}.fa-subscript:before{content:""}.fa-eraser:before{content:""}.fa-puzzle-piece:before{content:""}.fa-microphone:before{content:""}.fa-microphone-slash:before{content:""}.fa-shield:before{content:""}.fa-calendar-o:before{content:""}.fa-fire-extinguisher:before{content:""}.fa-rocket:before{content:""}.fa-maxcdn:before{content:""}.fa-chevron-circle-left:before{content:""}.fa-chevron-circle-right:before{content:""}.fa-chevron-circle-up:before{content:""}.fa-chevron-circle-down:before{content:""}.fa-html5:before{content:""}.fa-css3:before{content:""}.fa-anchor:before{content:""}.fa-unlock-alt:before{content:""}.fa-bullseye:before{content:""}.fa-ellipsis-h:before{content:""}.fa-ellipsis-v:before{content:""}.fa-rss-square:before{content:""}.fa-play-circle:before{content:""}.fa-ticket:before{content:""}.fa-minus-square:before{content:""}.fa-minus-square-o:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before{content:""}.fa-level-up:before{content:""}.fa-level-down:before{content:""}.fa-check-square:before{content:""}.fa-pencil-square:before{content:""}.fa-external-link-square:before{content:""}.fa-share-square:before{content:""}.fa-compass:before{content:""}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:""}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:""}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:""}.fa-eur:before,.fa-euro:before{content:""}.fa-gbp:before{content:""}.fa-dollar:before,.fa-usd:before{content:""}.fa-inr:before,.fa-rupee:before{content:""}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:""}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:""}.fa-krw:before,.fa-won:before{content:""}.fa-bitcoin:before,.fa-btc:before{content:""}.fa-file:before{content:""}.fa-file-text:before{content:""}.fa-sort-alpha-asc:before{content:""}.fa-sort-alpha-desc:before{content:""}.fa-sort-amount-asc:before{content:""}.fa-sort-amount-desc:before{content:""}.fa-sort-numeric-asc:before{content:""}.fa-sort-numeric-desc:before{content:""}.fa-thumbs-up:before{content:""}.fa-thumbs-down:before{content:""}.fa-youtube-square:before{content:""}.fa-youtube:before{content:""}.fa-xing:before{content:""}.fa-xing-square:before{content:""}.fa-youtube-play:before{content:""}.fa-dropbox:before{content:""}.fa-stack-overflow:before{content:""}.fa-instagram:before{content:""}.fa-flickr:before{content:""}.fa-adn:before{content:""}.fa-bitbucket:before,.icon-bitbucket:before{content:""}.fa-bitbucket-square:before{content:""}.fa-tumblr:before{content:""}.fa-tumblr-square:before{content:""}.fa-long-arrow-down:before{content:""}.fa-long-arrow-up:before{content:""}.fa-long-arrow-left:before{content:""}.fa-long-arrow-right:before{content:""}.fa-apple:before{content:""}.fa-windows:before{content:""}.fa-android:before{content:""}.fa-linux:before{content:""}.fa-dribbble:before{content:""}.fa-skype:before{content:""}.fa-foursquare:before{content:""}.fa-trello:before{content:""}.fa-female:before{content:""}.fa-male:before{content:""}.fa-gittip:before,.fa-gratipay:before{content:""}.fa-sun-o:before{content:""}.fa-moon-o:before{content:""}.fa-archive:before{content:""}.fa-bug:before{content:""}.fa-vk:before{content:""}.fa-weibo:before{content:""}.fa-renren:before{content:""}.fa-pagelines:before{content:""}.fa-stack-exchange:before{content:""}.fa-arrow-circle-o-right:before{content:""}.fa-arrow-circle-o-left:before{content:""}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:""}.fa-dot-circle-o:before{content:""}.fa-wheelchair:before{content:""}.fa-vimeo-square:before{content:""}.fa-try:before,.fa-turkish-lira:before{content:""}.fa-plus-square-o:before,.wy-menu-vertical li button.toctree-expand:before{content:""}.fa-space-shuttle:before{content:""}.fa-slack:before{content:""}.fa-envelope-square:before{content:""}.fa-wordpress:before{content:""}.fa-openid:before{content:""}.fa-bank:before,.fa-institution:before,.fa-university:before{content:""}.fa-graduation-cap:before,.fa-mortar-board:before{content:""}.fa-yahoo:before{content:""}.fa-google:before{content:""}.fa-reddit:before{content:""}.fa-reddit-square:before{content:""}.fa-stumbleupon-circle:before{content:""}.fa-stumbleupon:before{content:""}.fa-delicious:before{content:""}.fa-digg:before{content:""}.fa-pied-piper-pp:before{content:""}.fa-pied-piper-alt:before{content:""}.fa-drupal:before{content:""}.fa-joomla:before{content:""}.fa-language:before{content:""}.fa-fax:before{content:""}.fa-building:before{content:""}.fa-child:before{content:""}.fa-paw:before{content:""}.fa-spoon:before{content:""}.fa-cube:before{content:""}.fa-cubes:before{content:""}.fa-behance:before{content:""}.fa-behance-square:before{content:""}.fa-steam:before{content:""}.fa-steam-square:before{content:""}.fa-recycle:before{content:""}.fa-automobile:before,.fa-car:before{content:""}.fa-cab:before,.fa-taxi:before{content:""}.fa-tree:before{content:""}.fa-spotify:before{content:""}.fa-deviantart:before{content:""}.fa-soundcloud:before{content:""}.fa-database:before{content:""}.fa-file-pdf-o:before{content:""}.fa-file-word-o:before{content:""}.fa-file-excel-o:before{content:""}.fa-file-powerpoint-o:before{content:""}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:""}.fa-file-archive-o:before,.fa-file-zip-o:before{content:""}.fa-file-audio-o:before,.fa-file-sound-o:before{content:""}.fa-file-movie-o:before,.fa-file-video-o:before{content:""}.fa-file-code-o:before{content:""}.fa-vine:before{content:""}.fa-codepen:before{content:""}.fa-jsfiddle:before{content:""}.fa-life-bouy:before,.fa-life-buoy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:""}.fa-circle-o-notch:before{content:""}.fa-ra:before,.fa-rebel:before,.fa-resistance:before{content:""}.fa-empire:before,.fa-ge:before{content:""}.fa-git-square:before{content:""}.fa-git:before{content:""}.fa-hacker-news:before,.fa-y-combinator-square:before,.fa-yc-square:before{content:""}.fa-tencent-weibo:before{content:""}.fa-qq:before{content:""}.fa-wechat:before,.fa-weixin:before{content:""}.fa-paper-plane:before,.fa-send:before{content:""}.fa-paper-plane-o:before,.fa-send-o:before{content:""}.fa-history:before{content:""}.fa-circle-thin:before{content:""}.fa-header:before{content:""}.fa-paragraph:before{content:""}.fa-sliders:before{content:""}.fa-share-alt:before{content:""}.fa-share-alt-square:before{content:""}.fa-bomb:before{content:""}.fa-futbol-o:before,.fa-soccer-ball-o:before{content:""}.fa-tty:before{content:""}.fa-binoculars:before{content:""}.fa-plug:before{content:""}.fa-slideshare:before{content:""}.fa-twitch:before{content:""}.fa-yelp:before{content:""}.fa-newspaper-o:before{content:""}.fa-wifi:before{content:""}.fa-calculator:before{content:""}.fa-paypal:before{content:""}.fa-google-wallet:before{content:""}.fa-cc-visa:before{content:""}.fa-cc-mastercard:before{content:""}.fa-cc-discover:before{content:""}.fa-cc-amex:before{content:""}.fa-cc-paypal:before{content:""}.fa-cc-stripe:before{content:""}.fa-bell-slash:before{content:""}.fa-bell-slash-o:before{content:""}.fa-trash:before{content:""}.fa-copyright:before{content:""}.fa-at:before{content:""}.fa-eyedropper:before{content:""}.fa-paint-brush:before{content:""}.fa-birthday-cake:before{content:""}.fa-area-chart:before{content:""}.fa-pie-chart:before{content:""}.fa-line-chart:before{content:""}.fa-lastfm:before{content:""}.fa-lastfm-square:before{content:""}.fa-toggle-off:before{content:""}.fa-toggle-on:before{content:""}.fa-bicycle:before{content:""}.fa-bus:before{content:""}.fa-ioxhost:before{content:""}.fa-angellist:before{content:""}.fa-cc:before{content:""}.fa-ils:before,.fa-shekel:before,.fa-sheqel:before{content:""}.fa-meanpath:before{content:""}.fa-buysellads:before{content:""}.fa-connectdevelop:before{content:""}.fa-dashcube:before{content:""}.fa-forumbee:before{content:""}.fa-leanpub:before{content:""}.fa-sellsy:before{content:""}.fa-shirtsinbulk:before{content:""}.fa-simplybuilt:before{content:""}.fa-skyatlas:before{content:""}.fa-cart-plus:before{content:""}.fa-cart-arrow-down:before{content:""}.fa-diamond:before{content:""}.fa-ship:before{content:""}.fa-user-secret:before{content:""}.fa-motorcycle:before{content:""}.fa-street-view:before{content:""}.fa-heartbeat:before{content:""}.fa-venus:before{content:""}.fa-mars:before{content:""}.fa-mercury:before{content:""}.fa-intersex:before,.fa-transgender:before{content:""}.fa-transgender-alt:before{content:""}.fa-venus-double:before{content:""}.fa-mars-double:before{content:""}.fa-venus-mars:before{content:""}.fa-mars-stroke:before{content:""}.fa-mars-stroke-v:before{content:""}.fa-mars-stroke-h:before{content:""}.fa-neuter:before{content:""}.fa-genderless:before{content:""}.fa-facebook-official:before{content:""}.fa-pinterest-p:before{content:""}.fa-whatsapp:before{content:""}.fa-server:before{content:""}.fa-user-plus:before{content:""}.fa-user-times:before{content:""}.fa-bed:before,.fa-hotel:before{content:""}.fa-viacoin:before{content:""}.fa-train:before{content:""}.fa-subway:before{content:""}.fa-medium:before{content:""}.fa-y-combinator:before,.fa-yc:before{content:""}.fa-optin-monster:before{content:""}.fa-opencart:before{content:""}.fa-expeditedssl:before{content:""}.fa-battery-4:before,.fa-battery-full:before,.fa-battery:before{content:""}.fa-battery-3:before,.fa-battery-three-quarters:before{content:""}.fa-battery-2:before,.fa-battery-half:before{content:""}.fa-battery-1:before,.fa-battery-quarter:before{content:""}.fa-battery-0:before,.fa-battery-empty:before{content:""}.fa-mouse-pointer:before{content:""}.fa-i-cursor:before{content:""}.fa-object-group:before{content:""}.fa-object-ungroup:before{content:""}.fa-sticky-note:before{content:""}.fa-sticky-note-o:before{content:""}.fa-cc-jcb:before{content:""}.fa-cc-diners-club:before{content:""}.fa-clone:before{content:""}.fa-balance-scale:before{content:""}.fa-hourglass-o:before{content:""}.fa-hourglass-1:before,.fa-hourglass-start:before{content:""}.fa-hourglass-2:before,.fa-hourglass-half:before{content:""}.fa-hourglass-3:before,.fa-hourglass-end:before{content:""}.fa-hourglass:before{content:""}.fa-hand-grab-o:before,.fa-hand-rock-o:before{content:""}.fa-hand-paper-o:before,.fa-hand-stop-o:before{content:""}.fa-hand-scissors-o:before{content:""}.fa-hand-lizard-o:before{content:""}.fa-hand-spock-o:before{content:""}.fa-hand-pointer-o:before{content:""}.fa-hand-peace-o:before{content:""}.fa-trademark:before{content:""}.fa-registered:before{content:""}.fa-creative-commons:before{content:""}.fa-gg:before{content:""}.fa-gg-circle:before{content:""}.fa-tripadvisor:before{content:""}.fa-odnoklassniki:before{content:""}.fa-odnoklassniki-square:before{content:""}.fa-get-pocket:before{content:""}.fa-wikipedia-w:before{content:""}.fa-safari:before{content:""}.fa-chrome:before{content:""}.fa-firefox:before{content:""}.fa-opera:before{content:""}.fa-internet-explorer:before{content:""}.fa-television:before,.fa-tv:before{content:""}.fa-contao:before{content:""}.fa-500px:before{content:""}.fa-amazon:before{content:""}.fa-calendar-plus-o:before{content:""}.fa-calendar-minus-o:before{content:""}.fa-calendar-times-o:before{content:""}.fa-calendar-check-o:before{content:""}.fa-industry:before{content:""}.fa-map-pin:before{content:""}.fa-map-signs:before{content:""}.fa-map-o:before{content:""}.fa-map:before{content:""}.fa-commenting:before{content:""}.fa-commenting-o:before{content:""}.fa-houzz:before{content:""}.fa-vimeo:before{content:""}.fa-black-tie:before{content:""}.fa-fonticons:before{content:""}.fa-reddit-alien:before{content:""}.fa-edge:before{content:""}.fa-credit-card-alt:before{content:""}.fa-codiepie:before{content:""}.fa-modx:before{content:""}.fa-fort-awesome:before{content:""}.fa-usb:before{content:""}.fa-product-hunt:before{content:""}.fa-mixcloud:before{content:""}.fa-scribd:before{content:""}.fa-pause-circle:before{content:""}.fa-pause-circle-o:before{content:""}.fa-stop-circle:before{content:""}.fa-stop-circle-o:before{content:""}.fa-shopping-bag:before{content:""}.fa-shopping-basket:before{content:""}.fa-hashtag:before{content:""}.fa-bluetooth:before{content:""}.fa-bluetooth-b:before{content:""}.fa-percent:before{content:""}.fa-gitlab:before,.icon-gitlab:before{content:""}.fa-wpbeginner:before{content:""}.fa-wpforms:before{content:""}.fa-envira:before{content:""}.fa-universal-access:before{content:""}.fa-wheelchair-alt:before{content:""}.fa-question-circle-o:before{content:""}.fa-blind:before{content:""}.fa-audio-description:before{content:""}.fa-volume-control-phone:before{content:""}.fa-braille:before{content:""}.fa-assistive-listening-systems:before{content:""}.fa-american-sign-language-interpreting:before,.fa-asl-interpreting:before{content:""}.fa-deaf:before,.fa-deafness:before,.fa-hard-of-hearing:before{content:""}.fa-glide:before{content:""}.fa-glide-g:before{content:""}.fa-sign-language:before,.fa-signing:before{content:""}.fa-low-vision:before{content:""}.fa-viadeo:before{content:""}.fa-viadeo-square:before{content:""}.fa-snapchat:before{content:""}.fa-snapchat-ghost:before{content:""}.fa-snapchat-square:before{content:""}.fa-pied-piper:before{content:""}.fa-first-order:before{content:""}.fa-yoast:before{content:""}.fa-themeisle:before{content:""}.fa-google-plus-circle:before,.fa-google-plus-official:before{content:""}.fa-fa:before,.fa-font-awesome:before{content:""}.fa-handshake-o:before{content:""}.fa-envelope-open:before{content:""}.fa-envelope-open-o:before{content:""}.fa-linode:before{content:""}.fa-address-book:before{content:""}.fa-address-book-o:before{content:""}.fa-address-card:before,.fa-vcard:before{content:""}.fa-address-card-o:before,.fa-vcard-o:before{content:""}.fa-user-circle:before{content:""}.fa-user-circle-o:before{content:""}.fa-user-o:before{content:""}.fa-id-badge:before{content:""}.fa-drivers-license:before,.fa-id-card:before{content:""}.fa-drivers-license-o:before,.fa-id-card-o:before{content:""}.fa-quora:before{content:""}.fa-free-code-camp:before{content:""}.fa-telegram:before{content:""}.fa-thermometer-4:before,.fa-thermometer-full:before,.fa-thermometer:before{content:""}.fa-thermometer-3:before,.fa-thermometer-three-quarters:before{content:""}.fa-thermometer-2:before,.fa-thermometer-half:before{content:""}.fa-thermometer-1:before,.fa-thermometer-quarter:before{content:""}.fa-thermometer-0:before,.fa-thermometer-empty:before{content:""}.fa-shower:before{content:""}.fa-bath:before,.fa-bathtub:before,.fa-s15:before{content:""}.fa-podcast:before{content:""}.fa-window-maximize:before{content:""}.fa-window-minimize:before{content:""}.fa-window-restore:before{content:""}.fa-times-rectangle:before,.fa-window-close:before{content:""}.fa-times-rectangle-o:before,.fa-window-close-o:before{content:""}.fa-bandcamp:before{content:""}.fa-grav:before{content:""}.fa-etsy:before{content:""}.fa-imdb:before{content:""}.fa-ravelry:before{content:""}.fa-eercast:before{content:""}.fa-microchip:before{content:""}.fa-snowflake-o:before{content:""}.fa-superpowers:before{content:""}.fa-wpexplorer:before{content:""}.fa-meetup:before{content:""}.sr-only{position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0,0,0,0);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;margin:0;overflow:visible;clip:auto}.fa,.icon,.rst-content .admonition-title,.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content code.download span:first-child,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink,.rst-content tt.download span:first-child,.wy-dropdown .caret,.wy-inline-validate.wy-inline-validate-danger .wy-input-context,.wy-inline-validate.wy-inline-validate-info .wy-input-context,.wy-inline-validate.wy-inline-validate-success .wy-input-context,.wy-inline-validate.wy-inline-validate-warning .wy-input-context,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li button.toctree-expand{font-family:inherit}.fa:before,.icon:before,.rst-content .admonition-title:before,.rst-content .code-block-caption .headerlink:before,.rst-content .eqno .headerlink:before,.rst-content code.download span:first-child:before,.rst-content dl dt .headerlink:before,.rst-content h1 .headerlink:before,.rst-content h2 .headerlink:before,.rst-content h3 .headerlink:before,.rst-content h4 .headerlink:before,.rst-content h5 .headerlink:before,.rst-content h6 .headerlink:before,.rst-content p.caption .headerlink:before,.rst-content p .headerlink:before,.rst-content table>caption .headerlink:before,.rst-content tt.download span:first-child:before,.wy-dropdown .caret:before,.wy-inline-validate.wy-inline-validate-danger .wy-input-context:before,.wy-inline-validate.wy-inline-validate-info .wy-input-context:before,.wy-inline-validate.wy-inline-validate-success .wy-input-context:before,.wy-inline-validate.wy-inline-validate-warning .wy-input-context:before,.wy-menu-vertical li.current>a button.toctree-expand:before,.wy-menu-vertical li.on a button.toctree-expand:before,.wy-menu-vertical li button.toctree-expand:before{font-family:FontAwesome;display:inline-block;font-style:normal;font-weight:400;line-height:1;text-decoration:inherit}.rst-content .code-block-caption a .headerlink,.rst-content .eqno a .headerlink,.rst-content a .admonition-title,.rst-content code.download a span:first-child,.rst-content dl dt a .headerlink,.rst-content h1 a .headerlink,.rst-content h2 a .headerlink,.rst-content h3 a .headerlink,.rst-content h4 a .headerlink,.rst-content h5 a .headerlink,.rst-content h6 a .headerlink,.rst-content p.caption a .headerlink,.rst-content p a .headerlink,.rst-content table>caption a .headerlink,.rst-content tt.download a span:first-child,.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand,.wy-menu-vertical li a button.toctree-expand,a .fa,a .icon,a .rst-content .admonition-title,a .rst-content .code-block-caption .headerlink,a .rst-content .eqno .headerlink,a .rst-content code.download span:first-child,a .rst-content dl dt .headerlink,a .rst-content h1 .headerlink,a .rst-content h2 .headerlink,a .rst-content h3 .headerlink,a .rst-content h4 .headerlink,a .rst-content h5 .headerlink,a .rst-content h6 .headerlink,a .rst-content p.caption .headerlink,a .rst-content p .headerlink,a .rst-content table>caption .headerlink,a .rst-content tt.download span:first-child,a .wy-menu-vertical li button.toctree-expand{display:inline-block;text-decoration:inherit}.btn .fa,.btn .icon,.btn .rst-content .admonition-title,.btn .rst-content .code-block-caption .headerlink,.btn .rst-content .eqno .headerlink,.btn .rst-content code.download span:first-child,.btn .rst-content dl dt .headerlink,.btn .rst-content h1 .headerlink,.btn .rst-content h2 .headerlink,.btn .rst-content h3 .headerlink,.btn .rst-content h4 .headerlink,.btn .rst-content h5 .headerlink,.btn .rst-content h6 .headerlink,.btn .rst-content p .headerlink,.btn .rst-content table>caption .headerlink,.btn .rst-content tt.download span:first-child,.btn .wy-menu-vertical li.current>a button.toctree-expand,.btn .wy-menu-vertical li.on a button.toctree-expand,.btn .wy-menu-vertical li button.toctree-expand,.nav .fa,.nav .icon,.nav .rst-content .admonition-title,.nav .rst-content .code-block-caption .headerlink,.nav .rst-content .eqno .headerlink,.nav .rst-content code.download span:first-child,.nav .rst-content dl dt .headerlink,.nav .rst-content h1 .headerlink,.nav .rst-content h2 .headerlink,.nav .rst-content h3 .headerlink,.nav .rst-content h4 .headerlink,.nav .rst-content h5 .headerlink,.nav .rst-content h6 .headerlink,.nav .rst-content p .headerlink,.nav .rst-content table>caption .headerlink,.nav .rst-content tt.download span:first-child,.nav .wy-menu-vertical li.current>a button.toctree-expand,.nav .wy-menu-vertical li.on a button.toctree-expand,.nav .wy-menu-vertical li button.toctree-expand,.rst-content .btn .admonition-title,.rst-content .code-block-caption .btn .headerlink,.rst-content .code-block-caption .nav .headerlink,.rst-content .eqno .btn .headerlink,.rst-content .eqno .nav .headerlink,.rst-content .nav .admonition-title,.rst-content code.download .btn span:first-child,.rst-content code.download .nav span:first-child,.rst-content dl dt .btn .headerlink,.rst-content dl dt .nav .headerlink,.rst-content h1 .btn .headerlink,.rst-content h1 .nav .headerlink,.rst-content h2 .btn .headerlink,.rst-content h2 .nav .headerlink,.rst-content h3 .btn .headerlink,.rst-content h3 .nav .headerlink,.rst-content h4 .btn .headerlink,.rst-content h4 .nav .headerlink,.rst-content h5 .btn .headerlink,.rst-content h5 .nav .headerlink,.rst-content h6 .btn .headerlink,.rst-content h6 .nav .headerlink,.rst-content p .btn .headerlink,.rst-content p .nav .headerlink,.rst-content table>caption .btn .headerlink,.rst-content table>caption .nav .headerlink,.rst-content tt.download .btn span:first-child,.rst-content tt.download .nav span:first-child,.wy-menu-vertical li .btn button.toctree-expand,.wy-menu-vertical li.current>a .btn button.toctree-expand,.wy-menu-vertical li.current>a .nav button.toctree-expand,.wy-menu-vertical li .nav button.toctree-expand,.wy-menu-vertical li.on a .btn button.toctree-expand,.wy-menu-vertical li.on a .nav button.toctree-expand{display:inline}.btn .fa-large.icon,.btn .fa.fa-large,.btn .rst-content .code-block-caption .fa-large.headerlink,.btn .rst-content .eqno .fa-large.headerlink,.btn .rst-content .fa-large.admonition-title,.btn .rst-content code.download span.fa-large:first-child,.btn .rst-content dl dt .fa-large.headerlink,.btn .rst-content h1 .fa-large.headerlink,.btn .rst-content h2 .fa-large.headerlink,.btn .rst-content h3 .fa-large.headerlink,.btn .rst-content h4 .fa-large.headerlink,.btn .rst-content h5 .fa-large.headerlink,.btn .rst-content h6 .fa-large.headerlink,.btn .rst-content p .fa-large.headerlink,.btn .rst-content table>caption .fa-large.headerlink,.btn .rst-content tt.download span.fa-large:first-child,.btn .wy-menu-vertical li button.fa-large.toctree-expand,.nav .fa-large.icon,.nav .fa.fa-large,.nav .rst-content .code-block-caption .fa-large.headerlink,.nav .rst-content .eqno .fa-large.headerlink,.nav .rst-content .fa-large.admonition-title,.nav .rst-content code.download span.fa-large:first-child,.nav .rst-content dl dt .fa-large.headerlink,.nav .rst-content h1 .fa-large.headerlink,.nav .rst-content h2 .fa-large.headerlink,.nav .rst-content h3 .fa-large.headerlink,.nav .rst-content h4 .fa-large.headerlink,.nav .rst-content h5 .fa-large.headerlink,.nav .rst-content h6 .fa-large.headerlink,.nav .rst-content p .fa-large.headerlink,.nav .rst-content table>caption .fa-large.headerlink,.nav .rst-content tt.download span.fa-large:first-child,.nav .wy-menu-vertical li button.fa-large.toctree-expand,.rst-content .btn .fa-large.admonition-title,.rst-content .code-block-caption .btn .fa-large.headerlink,.rst-content .code-block-caption .nav .fa-large.headerlink,.rst-content .eqno .btn .fa-large.headerlink,.rst-content .eqno .nav .fa-large.headerlink,.rst-content .nav .fa-large.admonition-title,.rst-content code.download .btn span.fa-large:first-child,.rst-content code.download .nav span.fa-large:first-child,.rst-content dl dt .btn .fa-large.headerlink,.rst-content dl dt .nav .fa-large.headerlink,.rst-content h1 .btn .fa-large.headerlink,.rst-content h1 .nav .fa-large.headerlink,.rst-content h2 .btn .fa-large.headerlink,.rst-content h2 .nav .fa-large.headerlink,.rst-content h3 .btn .fa-large.headerlink,.rst-content h3 .nav .fa-large.headerlink,.rst-content h4 .btn .fa-large.headerlink,.rst-content h4 .nav .fa-large.headerlink,.rst-content h5 .btn .fa-large.headerlink,.rst-content h5 .nav .fa-large.headerlink,.rst-content h6 .btn .fa-large.headerlink,.rst-content h6 .nav .fa-large.headerlink,.rst-content p .btn .fa-large.headerlink,.rst-content p .nav .fa-large.headerlink,.rst-content table>caption .btn .fa-large.headerlink,.rst-content table>caption .nav .fa-large.headerlink,.rst-content tt.download .btn span.fa-large:first-child,.rst-content tt.download .nav span.fa-large:first-child,.wy-menu-vertical li .btn button.fa-large.toctree-expand,.wy-menu-vertical li .nav button.fa-large.toctree-expand{line-height:.9em}.btn .fa-spin.icon,.btn .fa.fa-spin,.btn .rst-content .code-block-caption .fa-spin.headerlink,.btn .rst-content .eqno .fa-spin.headerlink,.btn .rst-content .fa-spin.admonition-title,.btn .rst-content code.download span.fa-spin:first-child,.btn .rst-content dl dt .fa-spin.headerlink,.btn .rst-content h1 .fa-spin.headerlink,.btn .rst-content h2 .fa-spin.headerlink,.btn .rst-content h3 .fa-spin.headerlink,.btn .rst-content h4 .fa-spin.headerlink,.btn .rst-content h5 .fa-spin.headerlink,.btn .rst-content h6 .fa-spin.headerlink,.btn .rst-content p .fa-spin.headerlink,.btn .rst-content table>caption .fa-spin.headerlink,.btn .rst-content tt.download span.fa-spin:first-child,.btn .wy-menu-vertical li button.fa-spin.toctree-expand,.nav .fa-spin.icon,.nav .fa.fa-spin,.nav .rst-content .code-block-caption .fa-spin.headerlink,.nav .rst-content .eqno .fa-spin.headerlink,.nav .rst-content .fa-spin.admonition-title,.nav .rst-content code.download span.fa-spin:first-child,.nav .rst-content dl dt .fa-spin.headerlink,.nav .rst-content h1 .fa-spin.headerlink,.nav .rst-content h2 .fa-spin.headerlink,.nav .rst-content h3 .fa-spin.headerlink,.nav .rst-content h4 .fa-spin.headerlink,.nav .rst-content h5 .fa-spin.headerlink,.nav .rst-content h6 .fa-spin.headerlink,.nav .rst-content p .fa-spin.headerlink,.nav .rst-content table>caption .fa-spin.headerlink,.nav .rst-content tt.download span.fa-spin:first-child,.nav .wy-menu-vertical li button.fa-spin.toctree-expand,.rst-content .btn .fa-spin.admonition-title,.rst-content .code-block-caption .btn .fa-spin.headerlink,.rst-content .code-block-caption .nav .fa-spin.headerlink,.rst-content .eqno .btn .fa-spin.headerlink,.rst-content .eqno .nav .fa-spin.headerlink,.rst-content .nav .fa-spin.admonition-title,.rst-content code.download .btn span.fa-spin:first-child,.rst-content code.download .nav span.fa-spin:first-child,.rst-content dl dt .btn .fa-spin.headerlink,.rst-content dl dt .nav .fa-spin.headerlink,.rst-content h1 .btn .fa-spin.headerlink,.rst-content h1 .nav .fa-spin.headerlink,.rst-content h2 .btn .fa-spin.headerlink,.rst-content h2 .nav .fa-spin.headerlink,.rst-content h3 .btn .fa-spin.headerlink,.rst-content h3 .nav .fa-spin.headerlink,.rst-content h4 .btn .fa-spin.headerlink,.rst-content h4 .nav .fa-spin.headerlink,.rst-content h5 .btn .fa-spin.headerlink,.rst-content h5 .nav .fa-spin.headerlink,.rst-content h6 .btn .fa-spin.headerlink,.rst-content h6 .nav .fa-spin.headerlink,.rst-content p .btn .fa-spin.headerlink,.rst-content p .nav .fa-spin.headerlink,.rst-content table>caption .btn .fa-spin.headerlink,.rst-content table>caption .nav .fa-spin.headerlink,.rst-content tt.download .btn span.fa-spin:first-child,.rst-content tt.download .nav span.fa-spin:first-child,.wy-menu-vertical li .btn button.fa-spin.toctree-expand,.wy-menu-vertical li .nav button.fa-spin.toctree-expand{display:inline-block}.btn.fa:before,.btn.icon:before,.rst-content .btn.admonition-title:before,.rst-content .code-block-caption .btn.headerlink:before,.rst-content .eqno .btn.headerlink:before,.rst-content code.download span.btn:first-child:before,.rst-content dl dt .btn.headerlink:before,.rst-content h1 .btn.headerlink:before,.rst-content h2 .btn.headerlink:before,.rst-content h3 .btn.headerlink:before,.rst-content h4 .btn.headerlink:before,.rst-content h5 .btn.headerlink:before,.rst-content h6 .btn.headerlink:before,.rst-content p .btn.headerlink:before,.rst-content table>caption .btn.headerlink:before,.rst-content tt.download span.btn:first-child:before,.wy-menu-vertical li button.btn.toctree-expand:before{opacity:.5;-webkit-transition:opacity .05s ease-in;-moz-transition:opacity .05s ease-in;transition:opacity .05s ease-in}.btn.fa:hover:before,.btn.icon:hover:before,.rst-content .btn.admonition-title:hover:before,.rst-content .code-block-caption .btn.headerlink:hover:before,.rst-content .eqno .btn.headerlink:hover:before,.rst-content code.download span.btn:first-child:hover:before,.rst-content dl dt .btn.headerlink:hover:before,.rst-content h1 .btn.headerlink:hover:before,.rst-content h2 .btn.headerlink:hover:before,.rst-content h3 .btn.headerlink:hover:before,.rst-content h4 .btn.headerlink:hover:before,.rst-content h5 .btn.headerlink:hover:before,.rst-content h6 .btn.headerlink:hover:before,.rst-content p .btn.headerlink:hover:before,.rst-content table>caption .btn.headerlink:hover:before,.rst-content tt.download span.btn:first-child:hover:before,.wy-menu-vertical li button.btn.toctree-expand:hover:before{opacity:1}.btn-mini .fa:before,.btn-mini .icon:before,.btn-mini .rst-content .admonition-title:before,.btn-mini .rst-content .code-block-caption .headerlink:before,.btn-mini .rst-content .eqno .headerlink:before,.btn-mini .rst-content code.download span:first-child:before,.btn-mini .rst-content dl dt .headerlink:before,.btn-mini .rst-content h1 .headerlink:before,.btn-mini .rst-content h2 .headerlink:before,.btn-mini .rst-content h3 .headerlink:before,.btn-mini .rst-content h4 .headerlink:before,.btn-mini .rst-content h5 .headerlink:before,.btn-mini .rst-content h6 .headerlink:before,.btn-mini .rst-content p .headerlink:before,.btn-mini .rst-content table>caption .headerlink:before,.btn-mini .rst-content tt.download span:first-child:before,.btn-mini .wy-menu-vertical li button.toctree-expand:before,.rst-content .btn-mini .admonition-title:before,.rst-content .code-block-caption .btn-mini .headerlink:before,.rst-content .eqno .btn-mini .headerlink:before,.rst-content code.download .btn-mini span:first-child:before,.rst-content dl dt .btn-mini .headerlink:before,.rst-content h1 .btn-mini .headerlink:before,.rst-content h2 .btn-mini .headerlink:before,.rst-content h3 .btn-mini .headerlink:before,.rst-content h4 .btn-mini .headerlink:before,.rst-content h5 .btn-mini .headerlink:before,.rst-content h6 .btn-mini .headerlink:before,.rst-content p .btn-mini .headerlink:before,.rst-content table>caption .btn-mini .headerlink:before,.rst-content tt.download .btn-mini span:first-child:before,.wy-menu-vertical li .btn-mini button.toctree-expand:before{font-size:14px;vertical-align:-15%}.rst-content .admonition,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .danger,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning,.wy-alert{padding:12px;line-height:24px;margin-bottom:24px;background:#e7f2fa}.rst-content .admonition-title,.wy-alert-title{font-weight:700;display:block;color:#fff;background:#6ab0de;padding:6px 12px;margin:-12px -12px 12px}.rst-content .danger,.rst-content .error,.rst-content .wy-alert-danger.admonition,.rst-content .wy-alert-danger.admonition-todo,.rst-content .wy-alert-danger.attention,.rst-content .wy-alert-danger.caution,.rst-content .wy-alert-danger.hint,.rst-content .wy-alert-danger.important,.rst-content .wy-alert-danger.note,.rst-content .wy-alert-danger.seealso,.rst-content .wy-alert-danger.tip,.rst-content .wy-alert-danger.warning,.wy-alert.wy-alert-danger{background:#fdf3f2}.rst-content .danger .admonition-title,.rst-content .danger .wy-alert-title,.rst-content .error .admonition-title,.rst-content .error .wy-alert-title,.rst-content .wy-alert-danger.admonition-todo .admonition-title,.rst-content .wy-alert-danger.admonition-todo .wy-alert-title,.rst-content .wy-alert-danger.admonition .admonition-title,.rst-content .wy-alert-danger.admonition .wy-alert-title,.rst-content .wy-alert-danger.attention .admonition-title,.rst-content .wy-alert-danger.attention .wy-alert-title,.rst-content .wy-alert-danger.caution .admonition-title,.rst-content .wy-alert-danger.caution .wy-alert-title,.rst-content .wy-alert-danger.hint .admonition-title,.rst-content .wy-alert-danger.hint .wy-alert-title,.rst-content .wy-alert-danger.important .admonition-title,.rst-content .wy-alert-danger.important .wy-alert-title,.rst-content .wy-alert-danger.note .admonition-title,.rst-content .wy-alert-danger.note .wy-alert-title,.rst-content .wy-alert-danger.seealso .admonition-title,.rst-content .wy-alert-danger.seealso .wy-alert-title,.rst-content .wy-alert-danger.tip .admonition-title,.rst-content .wy-alert-danger.tip .wy-alert-title,.rst-content .wy-alert-danger.warning .admonition-title,.rst-content .wy-alert-danger.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-danger .admonition-title,.wy-alert.wy-alert-danger .rst-content .admonition-title,.wy-alert.wy-alert-danger .wy-alert-title{background:#f29f97}.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .warning,.rst-content .wy-alert-warning.admonition,.rst-content .wy-alert-warning.danger,.rst-content .wy-alert-warning.error,.rst-content .wy-alert-warning.hint,.rst-content .wy-alert-warning.important,.rst-content .wy-alert-warning.note,.rst-content .wy-alert-warning.seealso,.rst-content .wy-alert-warning.tip,.wy-alert.wy-alert-warning{background:#ffedcc}.rst-content .admonition-todo .admonition-title,.rst-content .admonition-todo .wy-alert-title,.rst-content .attention .admonition-title,.rst-content .attention .wy-alert-title,.rst-content .caution .admonition-title,.rst-content .caution .wy-alert-title,.rst-content .warning .admonition-title,.rst-content .warning .wy-alert-title,.rst-content .wy-alert-warning.admonition .admonition-title,.rst-content .wy-alert-warning.admonition .wy-alert-title,.rst-content .wy-alert-warning.danger .admonition-title,.rst-content .wy-alert-warning.danger .wy-alert-title,.rst-content .wy-alert-warning.error .admonition-title,.rst-content .wy-alert-warning.error .wy-alert-title,.rst-content .wy-alert-warning.hint .admonition-title,.rst-content .wy-alert-warning.hint .wy-alert-title,.rst-content .wy-alert-warning.important .admonition-title,.rst-content .wy-alert-warning.important .wy-alert-title,.rst-content .wy-alert-warning.note .admonition-title,.rst-content .wy-alert-warning.note .wy-alert-title,.rst-content .wy-alert-warning.seealso .admonition-title,.rst-content .wy-alert-warning.seealso .wy-alert-title,.rst-content .wy-alert-warning.tip .admonition-title,.rst-content .wy-alert-warning.tip .wy-alert-title,.rst-content .wy-alert.wy-alert-warning .admonition-title,.wy-alert.wy-alert-warning .rst-content .admonition-title,.wy-alert.wy-alert-warning .wy-alert-title{background:#f0b37e}.rst-content .note,.rst-content .seealso,.rst-content .wy-alert-info.admonition,.rst-content .wy-alert-info.admonition-todo,.rst-content .wy-alert-info.attention,.rst-content .wy-alert-info.caution,.rst-content .wy-alert-info.danger,.rst-content .wy-alert-info.error,.rst-content .wy-alert-info.hint,.rst-content .wy-alert-info.important,.rst-content .wy-alert-info.tip,.rst-content .wy-alert-info.warning,.wy-alert.wy-alert-info{background:#e7f2fa}.rst-content .note .admonition-title,.rst-content .note .wy-alert-title,.rst-content .seealso .admonition-title,.rst-content .seealso .wy-alert-title,.rst-content .wy-alert-info.admonition-todo .admonition-title,.rst-content .wy-alert-info.admonition-todo .wy-alert-title,.rst-content .wy-alert-info.admonition .admonition-title,.rst-content .wy-alert-info.admonition .wy-alert-title,.rst-content .wy-alert-info.attention .admonition-title,.rst-content .wy-alert-info.attention .wy-alert-title,.rst-content .wy-alert-info.caution .admonition-title,.rst-content .wy-alert-info.caution .wy-alert-title,.rst-content .wy-alert-info.danger .admonition-title,.rst-content .wy-alert-info.danger .wy-alert-title,.rst-content .wy-alert-info.error .admonition-title,.rst-content .wy-alert-info.error .wy-alert-title,.rst-content .wy-alert-info.hint .admonition-title,.rst-content .wy-alert-info.hint .wy-alert-title,.rst-content .wy-alert-info.important .admonition-title,.rst-content .wy-alert-info.important .wy-alert-title,.rst-content .wy-alert-info.tip .admonition-title,.rst-content .wy-alert-info.tip .wy-alert-title,.rst-content .wy-alert-info.warning .admonition-title,.rst-content .wy-alert-info.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-info .admonition-title,.wy-alert.wy-alert-info .rst-content .admonition-title,.wy-alert.wy-alert-info .wy-alert-title{background:#6ab0de}.rst-content .hint,.rst-content .important,.rst-content .tip,.rst-content .wy-alert-success.admonition,.rst-content .wy-alert-success.admonition-todo,.rst-content .wy-alert-success.attention,.rst-content .wy-alert-success.caution,.rst-content .wy-alert-success.danger,.rst-content .wy-alert-success.error,.rst-content .wy-alert-success.note,.rst-content .wy-alert-success.seealso,.rst-content .wy-alert-success.warning,.wy-alert.wy-alert-success{background:#dbfaf4}.rst-content .hint .admonition-title,.rst-content .hint .wy-alert-title,.rst-content .important .admonition-title,.rst-content .important .wy-alert-title,.rst-content .tip .admonition-title,.rst-content .tip .wy-alert-title,.rst-content .wy-alert-success.admonition-todo .admonition-title,.rst-content .wy-alert-success.admonition-todo .wy-alert-title,.rst-content .wy-alert-success.admonition .admonition-title,.rst-content .wy-alert-success.admonition .wy-alert-title,.rst-content .wy-alert-success.attention .admonition-title,.rst-content .wy-alert-success.attention .wy-alert-title,.rst-content .wy-alert-success.caution .admonition-title,.rst-content .wy-alert-success.caution .wy-alert-title,.rst-content .wy-alert-success.danger .admonition-title,.rst-content .wy-alert-success.danger .wy-alert-title,.rst-content .wy-alert-success.error .admonition-title,.rst-content .wy-alert-success.error .wy-alert-title,.rst-content .wy-alert-success.note .admonition-title,.rst-content .wy-alert-success.note .wy-alert-title,.rst-content .wy-alert-success.seealso .admonition-title,.rst-content .wy-alert-success.seealso .wy-alert-title,.rst-content .wy-alert-success.warning .admonition-title,.rst-content .wy-alert-success.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-success .admonition-title,.wy-alert.wy-alert-success .rst-content .admonition-title,.wy-alert.wy-alert-success .wy-alert-title{background:#1abc9c}.rst-content .wy-alert-neutral.admonition,.rst-content .wy-alert-neutral.admonition-todo,.rst-content .wy-alert-neutral.attention,.rst-content .wy-alert-neutral.caution,.rst-content .wy-alert-neutral.danger,.rst-content .wy-alert-neutral.error,.rst-content .wy-alert-neutral.hint,.rst-content .wy-alert-neutral.important,.rst-content .wy-alert-neutral.note,.rst-content .wy-alert-neutral.seealso,.rst-content .wy-alert-neutral.tip,.rst-content .wy-alert-neutral.warning,.wy-alert.wy-alert-neutral{background:#f3f6f6}.rst-content .wy-alert-neutral.admonition-todo .admonition-title,.rst-content .wy-alert-neutral.admonition-todo .wy-alert-title,.rst-content .wy-alert-neutral.admonition .admonition-title,.rst-content .wy-alert-neutral.admonition .wy-alert-title,.rst-content .wy-alert-neutral.attention .admonition-title,.rst-content .wy-alert-neutral.attention .wy-alert-title,.rst-content .wy-alert-neutral.caution .admonition-title,.rst-content .wy-alert-neutral.caution .wy-alert-title,.rst-content .wy-alert-neutral.danger .admonition-title,.rst-content .wy-alert-neutral.danger .wy-alert-title,.rst-content .wy-alert-neutral.error .admonition-title,.rst-content .wy-alert-neutral.error .wy-alert-title,.rst-content .wy-alert-neutral.hint .admonition-title,.rst-content .wy-alert-neutral.hint .wy-alert-title,.rst-content .wy-alert-neutral.important .admonition-title,.rst-content .wy-alert-neutral.important .wy-alert-title,.rst-content .wy-alert-neutral.note .admonition-title,.rst-content .wy-alert-neutral.note .wy-alert-title,.rst-content .wy-alert-neutral.seealso .admonition-title,.rst-content .wy-alert-neutral.seealso .wy-alert-title,.rst-content .wy-alert-neutral.tip .admonition-title,.rst-content .wy-alert-neutral.tip .wy-alert-title,.rst-content .wy-alert-neutral.warning .admonition-title,.rst-content .wy-alert-neutral.warning .wy-alert-title,.rst-content .wy-alert.wy-alert-neutral .admonition-title,.wy-alert.wy-alert-neutral .rst-content .admonition-title,.wy-alert.wy-alert-neutral .wy-alert-title{color:#404040;background:#e1e4e5}.rst-content .wy-alert-neutral.admonition-todo a,.rst-content .wy-alert-neutral.admonition a,.rst-content .wy-alert-neutral.attention a,.rst-content .wy-alert-neutral.caution a,.rst-content .wy-alert-neutral.danger a,.rst-content .wy-alert-neutral.error a,.rst-content .wy-alert-neutral.hint a,.rst-content .wy-alert-neutral.important a,.rst-content .wy-alert-neutral.note a,.rst-content .wy-alert-neutral.seealso a,.rst-content .wy-alert-neutral.tip a,.rst-content .wy-alert-neutral.warning a,.wy-alert.wy-alert-neutral a{color:#2980b9}.rst-content .admonition-todo p:last-child,.rst-content .admonition p:last-child,.rst-content .attention p:last-child,.rst-content .caution p:last-child,.rst-content .danger p:last-child,.rst-content .error p:last-child,.rst-content .hint p:last-child,.rst-content .important p:last-child,.rst-content .note p:last-child,.rst-content .seealso p:last-child,.rst-content .tip p:last-child,.rst-content .warning p:last-child,.wy-alert p:last-child{margin-bottom:0}.wy-tray-container{position:fixed;bottom:0;left:0;z-index:600}.wy-tray-container li{display:block;width:300px;background:transparent;color:#fff;text-align:center;box-shadow:0 5px 5px 0 rgba(0,0,0,.1);padding:0 24px;min-width:20%;opacity:0;height:0;line-height:56px;overflow:hidden;-webkit-transition:all .3s ease-in;-moz-transition:all .3s ease-in;transition:all .3s ease-in}.wy-tray-container li.wy-tray-item-success{background:#27ae60}.wy-tray-container li.wy-tray-item-info{background:#2980b9}.wy-tray-container li.wy-tray-item-warning{background:#e67e22}.wy-tray-container li.wy-tray-item-danger{background:#e74c3c}.wy-tray-container li.on{opacity:1;height:56px}@media screen and (max-width:768px){.wy-tray-container{bottom:auto;top:0;width:100%}.wy-tray-container li{width:100%}}button{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle;cursor:pointer;line-height:normal;-webkit-appearance:button;*overflow:visible}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}button[disabled]{cursor:default}.btn{display:inline-block;border-radius:2px;line-height:normal;white-space:nowrap;text-align:center;cursor:pointer;font-size:100%;padding:6px 12px 8px;color:#fff;border:1px solid rgba(0,0,0,.1);background-color:#27ae60;text-decoration:none;font-weight:400;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;box-shadow:inset 0 1px 2px -1px hsla(0,0%,100%,.5),inset 0 -2px 0 0 rgba(0,0,0,.1);outline-none:false;vertical-align:middle;*display:inline;zoom:1;-webkit-user-drag:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;-webkit-transition:all .1s linear;-moz-transition:all .1s linear;transition:all .1s linear}.btn-hover{background:#2e8ece;color:#fff}.btn:hover{background:#2cc36b;color:#fff}.btn:focus{background:#2cc36b;outline:0}.btn:active{box-shadow:inset 0 -1px 0 0 rgba(0,0,0,.05),inset 0 2px 0 0 rgba(0,0,0,.1);padding:8px 12px 6px}.btn:visited{color:#fff}.btn-disabled,.btn-disabled:active,.btn-disabled:focus,.btn-disabled:hover,.btn:disabled{background-image:none;filter:progid:DXImageTransform.Microsoft.gradient(enabled = false);filter:alpha(opacity=40);opacity:.4;cursor:not-allowed;box-shadow:none}.btn::-moz-focus-inner{padding:0;border:0}.btn-small{font-size:80%}.btn-info{background-color:#2980b9!important}.btn-info:hover{background-color:#2e8ece!important}.btn-neutral{background-color:#f3f6f6!important;color:#404040!important}.btn-neutral:hover{background-color:#e5ebeb!important;color:#404040}.btn-neutral:visited{color:#404040!important}.btn-success{background-color:#27ae60!important}.btn-success:hover{background-color:#295!important}.btn-danger{background-color:#e74c3c!important}.btn-danger:hover{background-color:#ea6153!important}.btn-warning{background-color:#e67e22!important}.btn-warning:hover{background-color:#e98b39!important}.btn-invert{background-color:#222}.btn-invert:hover{background-color:#2f2f2f!important}.btn-link{background-color:transparent!important;color:#2980b9;box-shadow:none;border-color:transparent!important}.btn-link:active,.btn-link:hover{background-color:transparent!important;color:#409ad5!important;box-shadow:none}.btn-link:visited{color:#9b59b6}.wy-btn-group .btn,.wy-control .btn{vertical-align:middle}.wy-btn-group{margin-bottom:24px;*zoom:1}.wy-btn-group:after,.wy-btn-group:before{display:table;content:""}.wy-btn-group:after{clear:both}.wy-dropdown{position:relative;display:inline-block}.wy-dropdown-active .wy-dropdown-menu{display:block}.wy-dropdown-menu{position:absolute;left:0;display:none;float:left;top:100%;min-width:100%;background:#fcfcfc;z-index:100;border:1px solid #cfd7dd;box-shadow:0 2px 2px 0 rgba(0,0,0,.1);padding:12px}.wy-dropdown-menu>dd>a{display:block;clear:both;color:#404040;white-space:nowrap;font-size:90%;padding:0 12px;cursor:pointer}.wy-dropdown-menu>dd>a:hover{background:#2980b9;color:#fff}.wy-dropdown-menu>dd.divider{border-top:1px solid #cfd7dd;margin:6px 0}.wy-dropdown-menu>dd.search{padding-bottom:12px}.wy-dropdown-menu>dd.search input[type=search]{width:100%}.wy-dropdown-menu>dd.call-to-action{background:#e3e3e3;text-transform:uppercase;font-weight:500;font-size:80%}.wy-dropdown-menu>dd.call-to-action:hover{background:#e3e3e3}.wy-dropdown-menu>dd.call-to-action .btn{color:#fff}.wy-dropdown.wy-dropdown-up .wy-dropdown-menu{bottom:100%;top:auto;left:auto;right:0}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu{background:#fcfcfc;margin-top:2px}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu a{padding:6px 12px}.wy-dropdown.wy-dropdown-bubble .wy-dropdown-menu a:hover{background:#2980b9;color:#fff}.wy-dropdown.wy-dropdown-left .wy-dropdown-menu{right:0;left:auto;text-align:right}.wy-dropdown-arrow:before{content:" ";border-bottom:5px solid #f5f5f5;border-left:5px solid transparent;border-right:5px solid transparent;position:absolute;display:block;top:-4px;left:50%;margin-left:-3px}.wy-dropdown-arrow.wy-dropdown-arrow-left:before{left:11px}.wy-form-stacked select{display:block}.wy-form-aligned .wy-help-inline,.wy-form-aligned input,.wy-form-aligned label,.wy-form-aligned select,.wy-form-aligned textarea{display:inline-block;*display:inline;*zoom:1;vertical-align:middle}.wy-form-aligned .wy-control-group>label{display:inline-block;vertical-align:middle;width:10em;margin:6px 12px 0 0;float:left}.wy-form-aligned .wy-control{float:left}.wy-form-aligned .wy-control label{display:block}.wy-form-aligned .wy-control select{margin-top:6px}fieldset{margin:0}fieldset,legend{border:0;padding:0}legend{width:100%;white-space:normal;margin-bottom:24px;font-size:150%;*margin-left:-7px}label,legend{display:block}label{margin:0 0 .3125em;color:#333;font-size:90%}input,select,textarea{font-size:100%;margin:0;vertical-align:baseline;*vertical-align:middle}.wy-control-group{margin-bottom:24px;max-width:1200px;margin-left:auto;margin-right:auto;*zoom:1}.wy-control-group:after,.wy-control-group:before{display:table;content:""}.wy-control-group:after{clear:both}.wy-control-group.wy-control-group-required>label:after{content:" *";color:#e74c3c}.wy-control-group .wy-form-full,.wy-control-group .wy-form-halves,.wy-control-group .wy-form-thirds{padding-bottom:12px}.wy-control-group .wy-form-full input[type=color],.wy-control-group .wy-form-full input[type=date],.wy-control-group .wy-form-full input[type=datetime-local],.wy-control-group .wy-form-full input[type=datetime],.wy-control-group .wy-form-full input[type=email],.wy-control-group .wy-form-full input[type=month],.wy-control-group .wy-form-full input[type=number],.wy-control-group .wy-form-full input[type=password],.wy-control-group .wy-form-full input[type=search],.wy-control-group .wy-form-full input[type=tel],.wy-control-group .wy-form-full input[type=text],.wy-control-group .wy-form-full input[type=time],.wy-control-group .wy-form-full input[type=url],.wy-control-group .wy-form-full input[type=week],.wy-control-group .wy-form-full select,.wy-control-group .wy-form-halves input[type=color],.wy-control-group .wy-form-halves input[type=date],.wy-control-group .wy-form-halves input[type=datetime-local],.wy-control-group .wy-form-halves input[type=datetime],.wy-control-group .wy-form-halves input[type=email],.wy-control-group .wy-form-halves input[type=month],.wy-control-group .wy-form-halves input[type=number],.wy-control-group .wy-form-halves input[type=password],.wy-control-group .wy-form-halves input[type=search],.wy-control-group .wy-form-halves input[type=tel],.wy-control-group .wy-form-halves input[type=text],.wy-control-group .wy-form-halves input[type=time],.wy-control-group .wy-form-halves input[type=url],.wy-control-group .wy-form-halves input[type=week],.wy-control-group .wy-form-halves select,.wy-control-group .wy-form-thirds input[type=color],.wy-control-group .wy-form-thirds input[type=date],.wy-control-group .wy-form-thirds input[type=datetime-local],.wy-control-group .wy-form-thirds input[type=datetime],.wy-control-group .wy-form-thirds input[type=email],.wy-control-group .wy-form-thirds input[type=month],.wy-control-group .wy-form-thirds input[type=number],.wy-control-group .wy-form-thirds input[type=password],.wy-control-group .wy-form-thirds input[type=search],.wy-control-group .wy-form-thirds input[type=tel],.wy-control-group .wy-form-thirds input[type=text],.wy-control-group .wy-form-thirds input[type=time],.wy-control-group .wy-form-thirds input[type=url],.wy-control-group .wy-form-thirds input[type=week],.wy-control-group .wy-form-thirds select{width:100%}.wy-control-group .wy-form-full{float:left;display:block;width:100%;margin-right:0}.wy-control-group .wy-form-full:last-child{margin-right:0}.wy-control-group .wy-form-halves{float:left;display:block;margin-right:2.35765%;width:48.82117%}.wy-control-group .wy-form-halves:last-child,.wy-control-group .wy-form-halves:nth-of-type(2n){margin-right:0}.wy-control-group .wy-form-halves:nth-of-type(odd){clear:left}.wy-control-group .wy-form-thirds{float:left;display:block;margin-right:2.35765%;width:31.76157%}.wy-control-group .wy-form-thirds:last-child,.wy-control-group .wy-form-thirds:nth-of-type(3n){margin-right:0}.wy-control-group .wy-form-thirds:nth-of-type(3n+1){clear:left}.wy-control-group.wy-control-group-no-input .wy-control,.wy-control-no-input{margin:6px 0 0;font-size:90%}.wy-control-no-input{display:inline-block}.wy-control-group.fluid-input input[type=color],.wy-control-group.fluid-input input[type=date],.wy-control-group.fluid-input input[type=datetime-local],.wy-control-group.fluid-input input[type=datetime],.wy-control-group.fluid-input input[type=email],.wy-control-group.fluid-input input[type=month],.wy-control-group.fluid-input input[type=number],.wy-control-group.fluid-input input[type=password],.wy-control-group.fluid-input input[type=search],.wy-control-group.fluid-input input[type=tel],.wy-control-group.fluid-input input[type=text],.wy-control-group.fluid-input input[type=time],.wy-control-group.fluid-input input[type=url],.wy-control-group.fluid-input input[type=week]{width:100%}.wy-form-message-inline{padding-left:.3em;color:#666;font-size:90%}.wy-form-message{display:block;color:#999;font-size:70%;margin-top:.3125em;font-style:italic}.wy-form-message p{font-size:inherit;font-style:italic;margin-bottom:6px}.wy-form-message p:last-child{margin-bottom:0}input{line-height:normal}input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;*overflow:visible}input[type=color],input[type=date],input[type=datetime-local],input[type=datetime],input[type=email],input[type=month],input[type=number],input[type=password],input[type=search],input[type=tel],input[type=text],input[type=time],input[type=url],input[type=week]{-webkit-appearance:none;padding:6px;display:inline-block;border:1px solid #ccc;font-size:80%;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;box-shadow:inset 0 1px 3px #ddd;border-radius:0;-webkit-transition:border .3s linear;-moz-transition:border .3s linear;transition:border .3s linear}input[type=datetime-local]{padding:.34375em .625em}input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{padding:0;margin-right:.3125em;*height:13px;*width:13px}input[type=checkbox],input[type=radio],input[type=search]{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}input[type=search]::-webkit-search-cancel-button,input[type=search]::-webkit-search-decoration{-webkit-appearance:none}input[type=color]:focus,input[type=date]:focus,input[type=datetime-local]:focus,input[type=datetime]:focus,input[type=email]:focus,input[type=month]:focus,input[type=number]:focus,input[type=password]:focus,input[type=search]:focus,input[type=tel]:focus,input[type=text]:focus,input[type=time]:focus,input[type=url]:focus,input[type=week]:focus{outline:0;outline:thin dotted\9;border-color:#333}input.no-focus:focus{border-color:#ccc!important}input[type=checkbox]:focus,input[type=file]:focus,input[type=radio]:focus{outline:thin dotted #333;outline:1px auto #129fea}input[type=color][disabled],input[type=date][disabled],input[type=datetime-local][disabled],input[type=datetime][disabled],input[type=email][disabled],input[type=month][disabled],input[type=number][disabled],input[type=password][disabled],input[type=search][disabled],input[type=tel][disabled],input[type=text][disabled],input[type=time][disabled],input[type=url][disabled],input[type=week][disabled]{cursor:not-allowed;background-color:#fafafa}input:focus:invalid,select:focus:invalid,textarea:focus:invalid{color:#e74c3c;border:1px solid #e74c3c}input:focus:invalid:focus,select:focus:invalid:focus,textarea:focus:invalid:focus{border-color:#e74c3c}input[type=checkbox]:focus:invalid:focus,input[type=file]:focus:invalid:focus,input[type=radio]:focus:invalid:focus{outline-color:#e74c3c}input.wy-input-large{padding:12px;font-size:100%}textarea{overflow:auto;vertical-align:top;width:100%;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif}select,textarea{padding:.5em .625em;display:inline-block;border:1px solid #ccc;font-size:80%;box-shadow:inset 0 1px 3px #ddd;-webkit-transition:border .3s linear;-moz-transition:border .3s linear;transition:border .3s linear}select{border:1px solid #ccc;background-color:#fff}select[multiple]{height:auto}select:focus,textarea:focus{outline:0}input[readonly],select[disabled],select[readonly],textarea[disabled],textarea[readonly]{cursor:not-allowed;background-color:#fafafa}input[type=checkbox][disabled],input[type=radio][disabled]{cursor:not-allowed}.wy-checkbox,.wy-radio{margin:6px 0;color:#404040;display:block}.wy-checkbox input,.wy-radio input{vertical-align:baseline}.wy-form-message-inline{display:inline-block;*display:inline;*zoom:1;vertical-align:middle}.wy-input-prefix,.wy-input-suffix{white-space:nowrap;padding:6px}.wy-input-prefix .wy-input-context,.wy-input-suffix .wy-input-context{line-height:27px;padding:0 8px;display:inline-block;font-size:80%;background-color:#f3f6f6;border:1px solid #ccc;color:#999}.wy-input-suffix .wy-input-context{border-left:0}.wy-input-prefix .wy-input-context{border-right:0}.wy-switch{position:relative;display:block;height:24px;margin-top:12px;cursor:pointer}.wy-switch:before{left:0;top:0;width:36px;height:12px;background:#ccc}.wy-switch:after,.wy-switch:before{position:absolute;content:"";display:block;border-radius:4px;-webkit-transition:all .2s ease-in-out;-moz-transition:all .2s ease-in-out;transition:all .2s ease-in-out}.wy-switch:after{width:18px;height:18px;background:#999;left:-3px;top:-3px}.wy-switch span{position:absolute;left:48px;display:block;font-size:12px;color:#ccc;line-height:1}.wy-switch.active:before{background:#1e8449}.wy-switch.active:after{left:24px;background:#27ae60}.wy-switch.disabled{cursor:not-allowed;opacity:.8}.wy-control-group.wy-control-group-error .wy-form-message,.wy-control-group.wy-control-group-error>label{color:#e74c3c}.wy-control-group.wy-control-group-error input[type=color],.wy-control-group.wy-control-group-error input[type=date],.wy-control-group.wy-control-group-error input[type=datetime-local],.wy-control-group.wy-control-group-error input[type=datetime],.wy-control-group.wy-control-group-error input[type=email],.wy-control-group.wy-control-group-error input[type=month],.wy-control-group.wy-control-group-error input[type=number],.wy-control-group.wy-control-group-error input[type=password],.wy-control-group.wy-control-group-error input[type=search],.wy-control-group.wy-control-group-error input[type=tel],.wy-control-group.wy-control-group-error input[type=text],.wy-control-group.wy-control-group-error input[type=time],.wy-control-group.wy-control-group-error input[type=url],.wy-control-group.wy-control-group-error input[type=week],.wy-control-group.wy-control-group-error textarea{border:1px solid #e74c3c}.wy-inline-validate{white-space:nowrap}.wy-inline-validate .wy-input-context{padding:.5em .625em;display:inline-block;font-size:80%}.wy-inline-validate.wy-inline-validate-success .wy-input-context{color:#27ae60}.wy-inline-validate.wy-inline-validate-danger .wy-input-context{color:#e74c3c}.wy-inline-validate.wy-inline-validate-warning .wy-input-context{color:#e67e22}.wy-inline-validate.wy-inline-validate-info .wy-input-context{color:#2980b9}.rotate-90{-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.rotate-180{-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.rotate-270{-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.mirror{-webkit-transform:scaleX(-1);-moz-transform:scaleX(-1);-ms-transform:scaleX(-1);-o-transform:scaleX(-1);transform:scaleX(-1)}.mirror.rotate-90{-webkit-transform:scaleX(-1) rotate(90deg);-moz-transform:scaleX(-1) rotate(90deg);-ms-transform:scaleX(-1) rotate(90deg);-o-transform:scaleX(-1) rotate(90deg);transform:scaleX(-1) rotate(90deg)}.mirror.rotate-180{-webkit-transform:scaleX(-1) rotate(180deg);-moz-transform:scaleX(-1) rotate(180deg);-ms-transform:scaleX(-1) rotate(180deg);-o-transform:scaleX(-1) rotate(180deg);transform:scaleX(-1) rotate(180deg)}.mirror.rotate-270{-webkit-transform:scaleX(-1) rotate(270deg);-moz-transform:scaleX(-1) rotate(270deg);-ms-transform:scaleX(-1) rotate(270deg);-o-transform:scaleX(-1) rotate(270deg);transform:scaleX(-1) rotate(270deg)}@media only screen and (max-width:480px){.wy-form button[type=submit]{margin:.7em 0 0}.wy-form input[type=color],.wy-form input[type=date],.wy-form input[type=datetime-local],.wy-form input[type=datetime],.wy-form input[type=email],.wy-form input[type=month],.wy-form input[type=number],.wy-form input[type=password],.wy-form input[type=search],.wy-form input[type=tel],.wy-form input[type=text],.wy-form input[type=time],.wy-form input[type=url],.wy-form input[type=week],.wy-form label{margin-bottom:.3em;display:block}.wy-form input[type=color],.wy-form input[type=date],.wy-form input[type=datetime-local],.wy-form input[type=datetime],.wy-form input[type=email],.wy-form input[type=month],.wy-form input[type=number],.wy-form input[type=password],.wy-form input[type=search],.wy-form input[type=tel],.wy-form input[type=time],.wy-form input[type=url],.wy-form input[type=week]{margin-bottom:0}.wy-form-aligned .wy-control-group label{margin-bottom:.3em;text-align:left;display:block;width:100%}.wy-form-aligned .wy-control{margin:1.5em 0 0}.wy-form-message,.wy-form-message-inline,.wy-form .wy-help-inline{display:block;font-size:80%;padding:6px 0}}@media screen and (max-width:768px){.tablet-hide{display:none}}@media screen and (max-width:480px){.mobile-hide{display:none}}.float-left{float:left}.float-right{float:right}.full-width{width:100%}.rst-content table.docutils,.rst-content table.field-list,.wy-table{border-collapse:collapse;border-spacing:0;empty-cells:show;margin-bottom:24px}.rst-content table.docutils caption,.rst-content table.field-list caption,.wy-table caption{color:#000;font:italic 85%/1 arial,sans-serif;padding:1em 0;text-align:center}.rst-content table.docutils td,.rst-content table.docutils th,.rst-content table.field-list td,.rst-content table.field-list th,.wy-table td,.wy-table th{font-size:90%;margin:0;overflow:visible;padding:8px 16px}.rst-content table.docutils td:first-child,.rst-content table.docutils th:first-child,.rst-content table.field-list td:first-child,.rst-content table.field-list th:first-child,.wy-table td:first-child,.wy-table th:first-child{border-left-width:0}.rst-content table.docutils thead,.rst-content table.field-list thead,.wy-table thead{color:#000;text-align:left;vertical-align:bottom;white-space:nowrap}.rst-content table.docutils thead th,.rst-content table.field-list thead th,.wy-table thead th{font-weight:700;border-bottom:2px solid #e1e4e5}.rst-content table.docutils td,.rst-content table.field-list td,.wy-table td{background-color:transparent;vertical-align:middle}.rst-content table.docutils td p,.rst-content table.field-list td p,.wy-table td p{line-height:18px}.rst-content table.docutils td p:last-child,.rst-content table.field-list td p:last-child,.wy-table td p:last-child{margin-bottom:0}.rst-content table.docutils .wy-table-cell-min,.rst-content table.field-list .wy-table-cell-min,.wy-table .wy-table-cell-min{width:1%;padding-right:0}.rst-content table.docutils .wy-table-cell-min input[type=checkbox],.rst-content table.field-list .wy-table-cell-min input[type=checkbox],.wy-table .wy-table-cell-min input[type=checkbox]{margin:0}.wy-table-secondary{color:grey;font-size:90%}.wy-table-tertiary{color:grey;font-size:80%}.rst-content table.docutils:not(.field-list) tr:nth-child(2n-1) td,.wy-table-backed,.wy-table-odd td,.wy-table-striped tr:nth-child(2n-1) td{background-color:#f3f6f6}.rst-content table.docutils,.wy-table-bordered-all{border:1px solid #e1e4e5}.rst-content table.docutils td,.wy-table-bordered-all td{border-bottom:1px solid #e1e4e5;border-left:1px solid #e1e4e5}.rst-content table.docutils tbody>tr:last-child td,.wy-table-bordered-all tbody>tr:last-child td{border-bottom-width:0}.wy-table-bordered{border:1px solid #e1e4e5}.wy-table-bordered-rows td{border-bottom:1px solid #e1e4e5}.wy-table-bordered-rows tbody>tr:last-child td{border-bottom-width:0}.wy-table-horizontal td,.wy-table-horizontal th{border-width:0 0 1px;border-bottom:1px solid #e1e4e5}.wy-table-horizontal tbody>tr:last-child td{border-bottom-width:0}.wy-table-responsive{margin-bottom:24px;max-width:100%;overflow:auto}.wy-table-responsive table{margin-bottom:0!important}.wy-table-responsive table td,.wy-table-responsive table th{white-space:nowrap}a{color:#2980b9;text-decoration:none;cursor:pointer}a:hover{color:#3091d1}a:visited{color:#9b59b6}html{height:100%}body,html{overflow-x:hidden}body{font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;font-weight:400;color:#404040;min-height:100%;background:#edf0f2}.wy-text-left{text-align:left}.wy-text-center{text-align:center}.wy-text-right{text-align:right}.wy-text-large{font-size:120%}.wy-text-normal{font-size:100%}.wy-text-small,small{font-size:80%}.wy-text-strike{text-decoration:line-through}.wy-text-warning{color:#e67e22!important}a.wy-text-warning:hover{color:#eb9950!important}.wy-text-info{color:#2980b9!important}a.wy-text-info:hover{color:#409ad5!important}.wy-text-success{color:#27ae60!important}a.wy-text-success:hover{color:#36d278!important}.wy-text-danger{color:#e74c3c!important}a.wy-text-danger:hover{color:#ed7669!important}.wy-text-neutral{color:#404040!important}a.wy-text-neutral:hover{color:#595959!important}.rst-content .toctree-wrapper>p.caption,h1,h2,h3,h4,h5,h6,legend{margin-top:0;font-weight:700;font-family:Roboto Slab,ff-tisa-web-pro,Georgia,Arial,sans-serif}p{line-height:24px;font-size:16px;margin:0 0 24px}h1{font-size:175%}.rst-content .toctree-wrapper>p.caption,h2{font-size:150%}h3{font-size:125%}h4{font-size:115%}h5{font-size:110%}h6{font-size:100%}hr{display:block;height:1px;border:0;border-top:1px solid #e1e4e5;margin:24px 0;padding:0}.rst-content code,.rst-content tt,code{white-space:nowrap;max-width:100%;background:#fff;border:1px solid #e1e4e5;font-size:75%;padding:0 5px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;color:#e74c3c;overflow-x:auto}.rst-content tt.code-large,code.code-large{font-size:90%}.rst-content .section ul,.rst-content .toctree-wrapper ul,.rst-content section ul,.wy-plain-list-disc,article ul{list-style:disc;line-height:24px;margin-bottom:24px}.rst-content .section ul li,.rst-content .toctree-wrapper ul li,.rst-content section ul li,.wy-plain-list-disc li,article ul li{list-style:disc;margin-left:24px}.rst-content .section ul li p:last-child,.rst-content .section ul li ul,.rst-content .toctree-wrapper ul li p:last-child,.rst-content .toctree-wrapper ul li ul,.rst-content section ul li p:last-child,.rst-content section ul li ul,.wy-plain-list-disc li p:last-child,.wy-plain-list-disc li ul,article ul li p:last-child,article ul li ul{margin-bottom:0}.rst-content .section ul li li,.rst-content .toctree-wrapper ul li li,.rst-content section ul li li,.wy-plain-list-disc li li,article ul li li{list-style:circle}.rst-content .section ul li li li,.rst-content .toctree-wrapper ul li li li,.rst-content section ul li li li,.wy-plain-list-disc li li li,article ul li li li{list-style:square}.rst-content .section ul li ol li,.rst-content .toctree-wrapper ul li ol li,.rst-content section ul li ol li,.wy-plain-list-disc li ol li,article ul li ol li{list-style:decimal}.rst-content .section ol,.rst-content .section ol.arabic,.rst-content .toctree-wrapper ol,.rst-content .toctree-wrapper ol.arabic,.rst-content section ol,.rst-content section ol.arabic,.wy-plain-list-decimal,article ol{list-style:decimal;line-height:24px;margin-bottom:24px}.rst-content .section ol.arabic li,.rst-content .section ol li,.rst-content .toctree-wrapper ol.arabic li,.rst-content .toctree-wrapper ol li,.rst-content section ol.arabic li,.rst-content section ol li,.wy-plain-list-decimal li,article ol li{list-style:decimal;margin-left:24px}.rst-content .section ol.arabic li ul,.rst-content .section ol li p:last-child,.rst-content .section ol li ul,.rst-content .toctree-wrapper ol.arabic li ul,.rst-content .toctree-wrapper ol li p:last-child,.rst-content .toctree-wrapper ol li ul,.rst-content section ol.arabic li ul,.rst-content section ol li p:last-child,.rst-content section ol li ul,.wy-plain-list-decimal li p:last-child,.wy-plain-list-decimal li ul,article ol li p:last-child,article ol li ul{margin-bottom:0}.rst-content .section ol.arabic li ul li,.rst-content .section ol li ul li,.rst-content .toctree-wrapper ol.arabic li ul li,.rst-content .toctree-wrapper ol li ul li,.rst-content section ol.arabic li ul li,.rst-content section ol li ul li,.wy-plain-list-decimal li ul li,article ol li ul li{list-style:disc}.wy-breadcrumbs{*zoom:1}.wy-breadcrumbs:after,.wy-breadcrumbs:before{display:table;content:""}.wy-breadcrumbs:after{clear:both}.wy-breadcrumbs>li{display:inline-block;padding-top:5px}.wy-breadcrumbs>li.wy-breadcrumbs-aside{float:right}.rst-content .wy-breadcrumbs>li code,.rst-content .wy-breadcrumbs>li tt,.wy-breadcrumbs>li .rst-content tt,.wy-breadcrumbs>li code{all:inherit;color:inherit}.breadcrumb-item:before{content:"/";color:#bbb;font-size:13px;padding:0 6px 0 3px}.wy-breadcrumbs-extra{margin-bottom:0;color:#b3b3b3;font-size:80%;display:inline-block}@media screen and (max-width:480px){.wy-breadcrumbs-extra,.wy-breadcrumbs li.wy-breadcrumbs-aside{display:none}}@media print{.wy-breadcrumbs li.wy-breadcrumbs-aside{display:none}}html{font-size:16px}.wy-affix{position:fixed;top:1.618em}.wy-menu a:hover{text-decoration:none}.wy-menu-horiz{*zoom:1}.wy-menu-horiz:after,.wy-menu-horiz:before{display:table;content:""}.wy-menu-horiz:after{clear:both}.wy-menu-horiz li,.wy-menu-horiz ul{display:inline-block}.wy-menu-horiz li:hover{background:hsla(0,0%,100%,.1)}.wy-menu-horiz li.divide-left{border-left:1px solid #404040}.wy-menu-horiz li.divide-right{border-right:1px solid #404040}.wy-menu-horiz a{height:32px;display:inline-block;line-height:32px;padding:0 16px}.wy-menu-vertical{width:300px}.wy-menu-vertical header,.wy-menu-vertical p.caption{color:#55a5d9;height:32px;line-height:32px;padding:0 1.618em;margin:12px 0 0;display:block;font-weight:700;text-transform:uppercase;font-size:85%;white-space:nowrap}.wy-menu-vertical ul{margin-bottom:0}.wy-menu-vertical li.divide-top{border-top:1px solid #404040}.wy-menu-vertical li.divide-bottom{border-bottom:1px solid #404040}.wy-menu-vertical li.current{background:#e3e3e3}.wy-menu-vertical li.current a{color:grey;border-right:1px solid #c9c9c9;padding:.4045em 2.427em}.wy-menu-vertical li.current a:hover{background:#d6d6d6}.rst-content .wy-menu-vertical li tt,.wy-menu-vertical li .rst-content tt,.wy-menu-vertical li code{border:none;background:inherit;color:inherit;padding-left:0;padding-right:0}.wy-menu-vertical li button.toctree-expand{display:block;float:left;margin-left:-1.2em;line-height:18px;color:#4d4d4d;border:none;background:none;padding:0}.wy-menu-vertical li.current>a,.wy-menu-vertical li.on a{color:#404040;font-weight:700;position:relative;background:#fcfcfc;border:none;padding:.4045em 1.618em}.wy-menu-vertical li.current>a:hover,.wy-menu-vertical li.on a:hover{background:#fcfcfc}.wy-menu-vertical li.current>a:hover button.toctree-expand,.wy-menu-vertical li.on a:hover button.toctree-expand{color:grey}.wy-menu-vertical li.current>a button.toctree-expand,.wy-menu-vertical li.on a button.toctree-expand{display:block;line-height:18px;color:#333}.wy-menu-vertical li.toctree-l1.current>a{border-bottom:1px solid #c9c9c9;border-top:1px solid #c9c9c9}.wy-menu-vertical .toctree-l1.current .toctree-l2>ul,.wy-menu-vertical .toctree-l2.current .toctree-l3>ul,.wy-menu-vertical .toctree-l3.current .toctree-l4>ul,.wy-menu-vertical .toctree-l4.current .toctree-l5>ul,.wy-menu-vertical .toctree-l5.current .toctree-l6>ul,.wy-menu-vertical .toctree-l6.current .toctree-l7>ul,.wy-menu-vertical .toctree-l7.current .toctree-l8>ul,.wy-menu-vertical .toctree-l8.current .toctree-l9>ul,.wy-menu-vertical .toctree-l9.current .toctree-l10>ul,.wy-menu-vertical .toctree-l10.current .toctree-l11>ul{display:none}.wy-menu-vertical .toctree-l1.current .current.toctree-l2>ul,.wy-menu-vertical .toctree-l2.current .current.toctree-l3>ul,.wy-menu-vertical .toctree-l3.current .current.toctree-l4>ul,.wy-menu-vertical .toctree-l4.current .current.toctree-l5>ul,.wy-menu-vertical .toctree-l5.current .current.toctree-l6>ul,.wy-menu-vertical .toctree-l6.current .current.toctree-l7>ul,.wy-menu-vertical .toctree-l7.current .current.toctree-l8>ul,.wy-menu-vertical .toctree-l8.current .current.toctree-l9>ul,.wy-menu-vertical .toctree-l9.current .current.toctree-l10>ul,.wy-menu-vertical .toctree-l10.current .current.toctree-l11>ul{display:block}.wy-menu-vertical li.toctree-l3,.wy-menu-vertical li.toctree-l4{font-size:.9em}.wy-menu-vertical li.toctree-l2 a,.wy-menu-vertical li.toctree-l3 a,.wy-menu-vertical li.toctree-l4 a,.wy-menu-vertical li.toctree-l5 a,.wy-menu-vertical li.toctree-l6 a,.wy-menu-vertical li.toctree-l7 a,.wy-menu-vertical li.toctree-l8 a,.wy-menu-vertical li.toctree-l9 a,.wy-menu-vertical li.toctree-l10 a{color:#404040}.wy-menu-vertical li.toctree-l2 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l3 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l4 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l5 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l6 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l7 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l8 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l9 a:hover button.toctree-expand,.wy-menu-vertical li.toctree-l10 a:hover button.toctree-expand{color:grey}.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a,.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a,.wy-menu-vertical li.toctree-l4.current li.toctree-l5>a,.wy-menu-vertical li.toctree-l5.current li.toctree-l6>a,.wy-menu-vertical li.toctree-l6.current li.toctree-l7>a,.wy-menu-vertical li.toctree-l7.current li.toctree-l8>a,.wy-menu-vertical li.toctree-l8.current li.toctree-l9>a,.wy-menu-vertical li.toctree-l9.current li.toctree-l10>a,.wy-menu-vertical li.toctree-l10.current li.toctree-l11>a{display:block}.wy-menu-vertical li.toctree-l2.current>a{padding:.4045em 2.427em}.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a{padding:.4045em 1.618em .4045em 4.045em}.wy-menu-vertical li.toctree-l3.current>a{padding:.4045em 4.045em}.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{padding:.4045em 1.618em .4045em 5.663em}.wy-menu-vertical li.toctree-l4.current>a{padding:.4045em 5.663em}.wy-menu-vertical li.toctree-l4.current li.toctree-l5>a{padding:.4045em 1.618em .4045em 7.281em}.wy-menu-vertical li.toctree-l5.current>a{padding:.4045em 7.281em}.wy-menu-vertical li.toctree-l5.current li.toctree-l6>a{padding:.4045em 1.618em .4045em 8.899em}.wy-menu-vertical li.toctree-l6.current>a{padding:.4045em 8.899em}.wy-menu-vertical li.toctree-l6.current li.toctree-l7>a{padding:.4045em 1.618em .4045em 10.517em}.wy-menu-vertical li.toctree-l7.current>a{padding:.4045em 10.517em}.wy-menu-vertical li.toctree-l7.current li.toctree-l8>a{padding:.4045em 1.618em .4045em 12.135em}.wy-menu-vertical li.toctree-l8.current>a{padding:.4045em 12.135em}.wy-menu-vertical li.toctree-l8.current li.toctree-l9>a{padding:.4045em 1.618em .4045em 13.753em}.wy-menu-vertical li.toctree-l9.current>a{padding:.4045em 13.753em}.wy-menu-vertical li.toctree-l9.current li.toctree-l10>a{padding:.4045em 1.618em .4045em 15.371em}.wy-menu-vertical li.toctree-l10.current>a{padding:.4045em 15.371em}.wy-menu-vertical li.toctree-l10.current li.toctree-l11>a{padding:.4045em 1.618em .4045em 16.989em}.wy-menu-vertical li.toctree-l2.current>a,.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a{background:#c9c9c9}.wy-menu-vertical li.toctree-l2 button.toctree-expand{color:#a3a3a3}.wy-menu-vertical li.toctree-l3.current>a,.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{background:#bdbdbd}.wy-menu-vertical li.toctree-l3 button.toctree-expand{color:#969696}.wy-menu-vertical li.current ul{display:block}.wy-menu-vertical li ul{margin-bottom:0;display:none}.wy-menu-vertical li ul li a{margin-bottom:0;color:#d9d9d9;font-weight:400}.wy-menu-vertical a{line-height:18px;padding:.4045em 1.618em;display:block;position:relative;font-size:90%;color:#d9d9d9}.wy-menu-vertical a:hover{background-color:#4e4a4a;cursor:pointer}.wy-menu-vertical a:hover button.toctree-expand{color:#d9d9d9}.wy-menu-vertical a:active{background-color:#2980b9;cursor:pointer;color:#fff}.wy-menu-vertical a:active button.toctree-expand{color:#fff}.wy-side-nav-search{display:block;width:300px;padding:.809em;margin-bottom:.809em;z-index:200;background-color:#2980b9;text-align:center;color:#fcfcfc}.wy-side-nav-search input[type=text]{width:100%;border-radius:50px;padding:6px 12px;border-color:#2472a4}.wy-side-nav-search img{display:block;margin:auto auto .809em;height:45px;width:45px;background-color:#2980b9;padding:5px;border-radius:100%}.wy-side-nav-search .wy-dropdown>a,.wy-side-nav-search>a{color:#fcfcfc;font-size:100%;font-weight:700;display:inline-block;padding:4px 6px;margin-bottom:.809em;max-width:100%}.wy-side-nav-search .wy-dropdown>a:hover,.wy-side-nav-search>a:hover{background:hsla(0,0%,100%,.1)}.wy-side-nav-search .wy-dropdown>a img.logo,.wy-side-nav-search>a img.logo{display:block;margin:0 auto;height:auto;width:auto;border-radius:0;max-width:100%;background:transparent}.wy-side-nav-search .wy-dropdown>a.icon img.logo,.wy-side-nav-search>a.icon img.logo{margin-top:.85em}.wy-side-nav-search>div.version{margin-top:-.4045em;margin-bottom:.809em;font-weight:400;color:hsla(0,0%,100%,.3)}.wy-nav .wy-menu-vertical header{color:#2980b9}.wy-nav .wy-menu-vertical a{color:#b3b3b3}.wy-nav .wy-menu-vertical a:hover{background-color:#2980b9;color:#fff}[data-menu-wrap]{-webkit-transition:all .2s ease-in;-moz-transition:all .2s ease-in;transition:all .2s ease-in;position:absolute;opacity:1;width:100%;opacity:0}[data-menu-wrap].move-center{left:0;right:auto;opacity:1}[data-menu-wrap].move-left{right:auto;left:-100%;opacity:0}[data-menu-wrap].move-right{right:-100%;left:auto;opacity:0}.wy-body-for-nav{background:#fcfcfc}.wy-grid-for-nav{position:absolute;width:100%;height:100%}.wy-nav-side{position:fixed;top:0;bottom:0;left:0;padding-bottom:2em;width:300px;overflow-x:hidden;overflow-y:hidden;min-height:100%;color:#9b9b9b;background:#343131;z-index:200}.wy-side-scroll{width:320px;position:relative;overflow-x:hidden;overflow-y:scroll;height:100%}.wy-nav-top{display:none;background:#2980b9;color:#fff;padding:.4045em .809em;position:relative;line-height:50px;text-align:center;font-size:100%;*zoom:1}.wy-nav-top:after,.wy-nav-top:before{display:table;content:""}.wy-nav-top:after{clear:both}.wy-nav-top a{color:#fff;font-weight:700}.wy-nav-top img{margin-right:12px;height:45px;width:45px;background-color:#2980b9;padding:5px;border-radius:100%}.wy-nav-top i{font-size:30px;float:left;cursor:pointer;padding-top:inherit}.wy-nav-content-wrap{margin-left:300px;background:#fcfcfc;min-height:100%}.wy-nav-content{padding:1.618em 3.236em;height:100%;max-width:800px;margin:auto}.wy-body-mask{position:fixed;width:100%;height:100%;background:rgba(0,0,0,.2);display:none;z-index:499}.wy-body-mask.on{display:block}footer{color:grey}footer p{margin-bottom:12px}.rst-content footer span.commit tt,footer span.commit .rst-content tt,footer span.commit code{padding:0;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;font-size:1em;background:none;border:none;color:grey}.rst-footer-buttons{*zoom:1}.rst-footer-buttons:after,.rst-footer-buttons:before{width:100%;display:table;content:""}.rst-footer-buttons:after{clear:both}.rst-breadcrumbs-buttons{margin-top:12px;*zoom:1}.rst-breadcrumbs-buttons:after,.rst-breadcrumbs-buttons:before{display:table;content:""}.rst-breadcrumbs-buttons:after{clear:both}#search-results .search li{margin-bottom:24px;border-bottom:1px solid #e1e4e5;padding-bottom:24px}#search-results .search li:first-child{border-top:1px solid #e1e4e5;padding-top:24px}#search-results .search li a{font-size:120%;margin-bottom:12px;display:inline-block}#search-results .context{color:grey;font-size:90%}.genindextable li>ul{margin-left:24px}@media screen and (max-width:768px){.wy-body-for-nav{background:#fcfcfc}.wy-nav-top{display:block}.wy-nav-side{left:-300px}.wy-nav-side.shift{width:85%;left:0}.wy-menu.wy-menu-vertical,.wy-side-nav-search,.wy-side-scroll{width:auto}.wy-nav-content-wrap{margin-left:0}.wy-nav-content-wrap .wy-nav-content{padding:1.618em}.wy-nav-content-wrap.shift{position:fixed;min-width:100%;left:85%;top:0;height:100%;overflow:hidden}}@media screen and (min-width:1100px){.wy-nav-content-wrap{background:rgba(0,0,0,.05)}.wy-nav-content{margin:0;background:#fcfcfc}}@media print{.rst-versions,.wy-nav-side,footer{display:none}.wy-nav-content-wrap{margin-left:0}}.rst-versions{position:fixed;bottom:0;left:0;width:300px;color:#fcfcfc;background:#1f1d1d;font-family:Lato,proxima-nova,Helvetica Neue,Arial,sans-serif;z-index:400}.rst-versions a{color:#2980b9;text-decoration:none}.rst-versions .rst-badge-small{display:none}.rst-versions .rst-current-version{padding:12px;background-color:#272525;display:block;text-align:right;font-size:90%;cursor:pointer;color:#27ae60;*zoom:1}.rst-versions .rst-current-version:after,.rst-versions .rst-current-version:before{display:table;content:""}.rst-versions .rst-current-version:after{clear:both}.rst-content .code-block-caption .rst-versions .rst-current-version .headerlink,.rst-content .eqno .rst-versions .rst-current-version .headerlink,.rst-content .rst-versions .rst-current-version .admonition-title,.rst-content code.download .rst-versions .rst-current-version span:first-child,.rst-content dl dt .rst-versions .rst-current-version .headerlink,.rst-content h1 .rst-versions .rst-current-version .headerlink,.rst-content h2 .rst-versions .rst-current-version .headerlink,.rst-content h3 .rst-versions .rst-current-version .headerlink,.rst-content h4 .rst-versions .rst-current-version .headerlink,.rst-content h5 .rst-versions .rst-current-version .headerlink,.rst-content h6 .rst-versions .rst-current-version .headerlink,.rst-content p .rst-versions .rst-current-version .headerlink,.rst-content table>caption .rst-versions .rst-current-version .headerlink,.rst-content tt.download .rst-versions .rst-current-version span:first-child,.rst-versions .rst-current-version .fa,.rst-versions .rst-current-version .icon,.rst-versions .rst-current-version .rst-content .admonition-title,.rst-versions .rst-current-version .rst-content .code-block-caption .headerlink,.rst-versions .rst-current-version .rst-content .eqno .headerlink,.rst-versions .rst-current-version .rst-content code.download span:first-child,.rst-versions .rst-current-version .rst-content dl dt .headerlink,.rst-versions .rst-current-version .rst-content h1 .headerlink,.rst-versions .rst-current-version .rst-content h2 .headerlink,.rst-versions .rst-current-version .rst-content h3 .headerlink,.rst-versions .rst-current-version .rst-content h4 .headerlink,.rst-versions .rst-current-version .rst-content h5 .headerlink,.rst-versions .rst-current-version .rst-content h6 .headerlink,.rst-versions .rst-current-version .rst-content p .headerlink,.rst-versions .rst-current-version .rst-content table>caption .headerlink,.rst-versions .rst-current-version .rst-content tt.download span:first-child,.rst-versions .rst-current-version .wy-menu-vertical li button.toctree-expand,.wy-menu-vertical li .rst-versions .rst-current-version button.toctree-expand{color:#fcfcfc}.rst-versions .rst-current-version .fa-book,.rst-versions .rst-current-version .icon-book{float:left}.rst-versions .rst-current-version.rst-out-of-date{background-color:#e74c3c;color:#fff}.rst-versions .rst-current-version.rst-active-old-version{background-color:#f1c40f;color:#000}.rst-versions.shift-up{height:auto;max-height:100%;overflow-y:scroll}.rst-versions.shift-up .rst-other-versions{display:block}.rst-versions .rst-other-versions{font-size:90%;padding:12px;color:grey;display:none}.rst-versions .rst-other-versions hr{display:block;height:1px;border:0;margin:20px 0;padding:0;border-top:1px solid #413d3d}.rst-versions .rst-other-versions dd{display:inline-block;margin:0}.rst-versions .rst-other-versions dd a{display:inline-block;padding:6px;color:#fcfcfc}.rst-versions.rst-badge{width:auto;bottom:20px;right:20px;left:auto;border:none;max-width:300px;max-height:90%}.rst-versions.rst-badge .fa-book,.rst-versions.rst-badge .icon-book{float:none;line-height:30px}.rst-versions.rst-badge.shift-up .rst-current-version{text-align:right}.rst-versions.rst-badge.shift-up .rst-current-version .fa-book,.rst-versions.rst-badge.shift-up .rst-current-version .icon-book{float:left}.rst-versions.rst-badge>.rst-current-version{width:auto;height:30px;line-height:30px;padding:0 6px;display:block;text-align:center}@media screen and (max-width:768px){.rst-versions{width:85%;display:none}.rst-versions.shift{display:block}}.rst-content .toctree-wrapper>p.caption,.rst-content h1,.rst-content h2,.rst-content h3,.rst-content h4,.rst-content h5,.rst-content h6{margin-bottom:24px}.rst-content img{max-width:100%;height:auto}.rst-content div.figure,.rst-content figure{margin-bottom:24px}.rst-content div.figure .caption-text,.rst-content figure .caption-text{font-style:italic}.rst-content div.figure p:last-child.caption,.rst-content figure p:last-child.caption{margin-bottom:0}.rst-content div.figure.align-center,.rst-content figure.align-center{text-align:center}.rst-content .section>a>img,.rst-content .section>img,.rst-content section>a>img,.rst-content section>img{margin-bottom:24px}.rst-content abbr[title]{text-decoration:none}.rst-content.style-external-links a.reference.external:after{font-family:FontAwesome;content:"\f08e";color:#b3b3b3;vertical-align:super;font-size:60%;margin:0 .2em}.rst-content blockquote{margin-left:24px;line-height:24px;margin-bottom:24px}.rst-content pre.literal-block{white-space:pre;margin:0;padding:12px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;display:block;overflow:auto}.rst-content div[class^=highlight],.rst-content pre.literal-block{border:1px solid #e1e4e5;overflow-x:auto;margin:1px 0 24px}.rst-content div[class^=highlight] div[class^=highlight],.rst-content pre.literal-block div[class^=highlight]{padding:0;border:none;margin:0}.rst-content div[class^=highlight] td.code{width:100%}.rst-content .linenodiv pre{border-right:1px solid #e6e9ea;margin:0;padding:12px;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;user-select:none;pointer-events:none}.rst-content div[class^=highlight] pre{white-space:pre;margin:0;padding:12px;display:block;overflow:auto}.rst-content div[class^=highlight] pre .hll{display:block;margin:0 -12px;padding:0 12px}.rst-content .linenodiv pre,.rst-content div[class^=highlight] pre,.rst-content pre.literal-block{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;font-size:12px;line-height:1.4}.rst-content div.highlight .gp,.rst-content div.highlight span.linenos{user-select:none;pointer-events:none}.rst-content div.highlight span.linenos{display:inline-block;padding-left:0;padding-right:12px;margin-right:12px;border-right:1px solid #e6e9ea}.rst-content .code-block-caption{font-style:italic;font-size:85%;line-height:1;padding:1em 0;text-align:center}@media print{.rst-content .codeblock,.rst-content div[class^=highlight],.rst-content div[class^=highlight] pre{white-space:pre-wrap}}.rst-content .admonition,.rst-content .admonition-todo,.rst-content .attention,.rst-content .caution,.rst-content .danger,.rst-content .error,.rst-content .hint,.rst-content .important,.rst-content .note,.rst-content .seealso,.rst-content .tip,.rst-content .warning{clear:both}.rst-content .admonition-todo .last,.rst-content .admonition-todo>:last-child,.rst-content .admonition .last,.rst-content .admonition>:last-child,.rst-content .attention .last,.rst-content .attention>:last-child,.rst-content .caution .last,.rst-content .caution>:last-child,.rst-content .danger .last,.rst-content .danger>:last-child,.rst-content .error .last,.rst-content .error>:last-child,.rst-content .hint .last,.rst-content .hint>:last-child,.rst-content .important .last,.rst-content .important>:last-child,.rst-content .note .last,.rst-content .note>:last-child,.rst-content .seealso .last,.rst-content .seealso>:last-child,.rst-content .tip .last,.rst-content .tip>:last-child,.rst-content .warning .last,.rst-content .warning>:last-child{margin-bottom:0}.rst-content .admonition-title:before{margin-right:4px}.rst-content .admonition table{border-color:rgba(0,0,0,.1)}.rst-content .admonition table td,.rst-content .admonition table th{background:transparent!important;border-color:rgba(0,0,0,.1)!important}.rst-content .section ol.loweralpha,.rst-content .section ol.loweralpha>li,.rst-content .toctree-wrapper ol.loweralpha,.rst-content .toctree-wrapper ol.loweralpha>li,.rst-content section ol.loweralpha,.rst-content section ol.loweralpha>li{list-style:lower-alpha}.rst-content .section ol.upperalpha,.rst-content .section ol.upperalpha>li,.rst-content .toctree-wrapper ol.upperalpha,.rst-content .toctree-wrapper ol.upperalpha>li,.rst-content section ol.upperalpha,.rst-content section ol.upperalpha>li{list-style:upper-alpha}.rst-content .section ol li>*,.rst-content .section ul li>*,.rst-content .toctree-wrapper ol li>*,.rst-content .toctree-wrapper ul li>*,.rst-content section ol li>*,.rst-content section ul li>*{margin-top:12px;margin-bottom:12px}.rst-content .section ol li>:first-child,.rst-content .section ul li>:first-child,.rst-content .toctree-wrapper ol li>:first-child,.rst-content .toctree-wrapper ul li>:first-child,.rst-content section ol li>:first-child,.rst-content section ul li>:first-child{margin-top:0}.rst-content .section ol li>p,.rst-content .section ol li>p:last-child,.rst-content .section ul li>p,.rst-content .section ul li>p:last-child,.rst-content .toctree-wrapper ol li>p,.rst-content .toctree-wrapper ol li>p:last-child,.rst-content .toctree-wrapper ul li>p,.rst-content .toctree-wrapper ul li>p:last-child,.rst-content section ol li>p,.rst-content section ol li>p:last-child,.rst-content section ul li>p,.rst-content section ul li>p:last-child{margin-bottom:12px}.rst-content .section ol li>p:only-child,.rst-content .section ol li>p:only-child:last-child,.rst-content .section ul li>p:only-child,.rst-content .section ul li>p:only-child:last-child,.rst-content .toctree-wrapper ol li>p:only-child,.rst-content .toctree-wrapper ol li>p:only-child:last-child,.rst-content .toctree-wrapper ul li>p:only-child,.rst-content .toctree-wrapper ul li>p:only-child:last-child,.rst-content section ol li>p:only-child,.rst-content section ol li>p:only-child:last-child,.rst-content section ul li>p:only-child,.rst-content section ul li>p:only-child:last-child{margin-bottom:0}.rst-content .section ol li>ol,.rst-content .section ol li>ul,.rst-content .section ul li>ol,.rst-content .section ul li>ul,.rst-content .toctree-wrapper ol li>ol,.rst-content .toctree-wrapper ol li>ul,.rst-content .toctree-wrapper ul li>ol,.rst-content .toctree-wrapper ul li>ul,.rst-content section ol li>ol,.rst-content section ol li>ul,.rst-content section ul li>ol,.rst-content section ul li>ul{margin-bottom:12px}.rst-content .section ol.simple li>*,.rst-content .section ol.simple li ol,.rst-content .section ol.simple li ul,.rst-content .section ul.simple li>*,.rst-content .section ul.simple li ol,.rst-content .section ul.simple li ul,.rst-content .toctree-wrapper ol.simple li>*,.rst-content .toctree-wrapper ol.simple li ol,.rst-content .toctree-wrapper ol.simple li ul,.rst-content .toctree-wrapper ul.simple li>*,.rst-content .toctree-wrapper ul.simple li ol,.rst-content .toctree-wrapper ul.simple li ul,.rst-content section ol.simple li>*,.rst-content section ol.simple li ol,.rst-content section ol.simple li ul,.rst-content section ul.simple li>*,.rst-content section ul.simple li ol,.rst-content section ul.simple li ul{margin-top:0;margin-bottom:0}.rst-content .line-block{margin-left:0;margin-bottom:24px;line-height:24px}.rst-content .line-block .line-block{margin-left:24px;margin-bottom:0}.rst-content .topic-title{font-weight:700;margin-bottom:12px}.rst-content .toc-backref{color:#404040}.rst-content .align-right{float:right;margin:0 0 24px 24px}.rst-content .align-left{float:left;margin:0 24px 24px 0}.rst-content .align-center{margin:auto}.rst-content .align-center:not(table){display:block}.rst-content .code-block-caption .headerlink,.rst-content .eqno .headerlink,.rst-content .toctree-wrapper>p.caption .headerlink,.rst-content dl dt .headerlink,.rst-content h1 .headerlink,.rst-content h2 .headerlink,.rst-content h3 .headerlink,.rst-content h4 .headerlink,.rst-content h5 .headerlink,.rst-content h6 .headerlink,.rst-content p.caption .headerlink,.rst-content p .headerlink,.rst-content table>caption .headerlink{opacity:0;font-size:14px;font-family:FontAwesome;margin-left:.5em}.rst-content .code-block-caption .headerlink:focus,.rst-content .code-block-caption:hover .headerlink,.rst-content .eqno .headerlink:focus,.rst-content .eqno:hover .headerlink,.rst-content .toctree-wrapper>p.caption .headerlink:focus,.rst-content .toctree-wrapper>p.caption:hover .headerlink,.rst-content dl dt .headerlink:focus,.rst-content dl dt:hover .headerlink,.rst-content h1 .headerlink:focus,.rst-content h1:hover .headerlink,.rst-content h2 .headerlink:focus,.rst-content h2:hover .headerlink,.rst-content h3 .headerlink:focus,.rst-content h3:hover .headerlink,.rst-content h4 .headerlink:focus,.rst-content h4:hover .headerlink,.rst-content h5 .headerlink:focus,.rst-content h5:hover .headerlink,.rst-content h6 .headerlink:focus,.rst-content h6:hover .headerlink,.rst-content p.caption .headerlink:focus,.rst-content p.caption:hover .headerlink,.rst-content p .headerlink:focus,.rst-content p:hover .headerlink,.rst-content table>caption .headerlink:focus,.rst-content table>caption:hover .headerlink{opacity:1}.rst-content p a{overflow-wrap:anywhere}.rst-content .wy-table td p,.rst-content .wy-table td ul,.rst-content .wy-table th p,.rst-content .wy-table th ul,.rst-content table.docutils td p,.rst-content table.docutils td ul,.rst-content table.docutils th p,.rst-content table.docutils th ul,.rst-content table.field-list td p,.rst-content table.field-list td ul,.rst-content table.field-list th p,.rst-content table.field-list th ul{font-size:inherit}.rst-content .btn:focus{outline:2px solid}.rst-content table>caption .headerlink:after{font-size:12px}.rst-content .centered{text-align:center}.rst-content .sidebar{float:right;width:40%;display:block;margin:0 0 24px 24px;padding:24px;background:#f3f6f6;border:1px solid #e1e4e5}.rst-content .sidebar dl,.rst-content .sidebar p,.rst-content .sidebar ul{font-size:90%}.rst-content .sidebar .last,.rst-content .sidebar>:last-child{margin-bottom:0}.rst-content .sidebar .sidebar-title{display:block;font-family:Roboto Slab,ff-tisa-web-pro,Georgia,Arial,sans-serif;font-weight:700;background:#e1e4e5;padding:6px 12px;margin:-24px -24px 24px;font-size:100%}.rst-content .highlighted{background:#f1c40f;box-shadow:0 0 0 2px #f1c40f;display:inline;font-weight:700}.rst-content .citation-reference,.rst-content .footnote-reference{vertical-align:baseline;position:relative;top:-.4em;line-height:0;font-size:90%}.rst-content .citation-reference>span.fn-bracket,.rst-content .footnote-reference>span.fn-bracket{display:none}.rst-content .hlist{width:100%}.rst-content dl dt span.classifier:before{content:" : "}.rst-content dl dt span.classifier-delimiter{display:none!important}html.writer-html4 .rst-content table.docutils.citation,html.writer-html4 .rst-content table.docutils.footnote{background:none;border:none}html.writer-html4 .rst-content table.docutils.citation td,html.writer-html4 .rst-content table.docutils.citation tr,html.writer-html4 .rst-content table.docutils.footnote td,html.writer-html4 .rst-content table.docutils.footnote tr{border:none;background-color:transparent!important;white-space:normal}html.writer-html4 .rst-content table.docutils.citation td.label,html.writer-html4 .rst-content table.docutils.footnote td.label{padding-left:0;padding-right:0;vertical-align:top}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.field-list,html.writer-html5 .rst-content dl.footnote{display:grid;grid-template-columns:auto minmax(80%,95%)}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dt{display:inline-grid;grid-template-columns:max-content auto}html.writer-html5 .rst-content aside.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content div.citation{display:grid;grid-template-columns:auto auto minmax(.65rem,auto) minmax(40%,95%)}html.writer-html5 .rst-content aside.citation>span.label,html.writer-html5 .rst-content aside.footnote>span.label,html.writer-html5 .rst-content div.citation>span.label{grid-column-start:1;grid-column-end:2}html.writer-html5 .rst-content aside.citation>span.backrefs,html.writer-html5 .rst-content aside.footnote>span.backrefs,html.writer-html5 .rst-content div.citation>span.backrefs{grid-column-start:2;grid-column-end:3;grid-row-start:1;grid-row-end:3}html.writer-html5 .rst-content aside.citation>p,html.writer-html5 .rst-content aside.footnote>p,html.writer-html5 .rst-content div.citation>p{grid-column-start:4;grid-column-end:5}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.field-list,html.writer-html5 .rst-content dl.footnote{margin-bottom:24px}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dt{padding-left:1rem}html.writer-html5 .rst-content dl.citation>dd,html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.field-list>dd,html.writer-html5 .rst-content dl.field-list>dt,html.writer-html5 .rst-content dl.footnote>dd,html.writer-html5 .rst-content dl.footnote>dt{margin-bottom:0}html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.footnote{font-size:.9rem}html.writer-html5 .rst-content dl.citation>dt,html.writer-html5 .rst-content dl.footnote>dt{margin:0 .5rem .5rem 0;line-height:1.2rem;word-break:break-all;font-weight:400}html.writer-html5 .rst-content dl.citation>dt>span.brackets:before,html.writer-html5 .rst-content dl.footnote>dt>span.brackets:before{content:"["}html.writer-html5 .rst-content dl.citation>dt>span.brackets:after,html.writer-html5 .rst-content dl.footnote>dt>span.brackets:after{content:"]"}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref{text-align:left;font-style:italic;margin-left:.65rem;word-break:break-word;word-spacing:-.1rem;max-width:5rem}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref>a,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref>a{word-break:keep-all}html.writer-html5 .rst-content dl.citation>dt>span.fn-backref>a:not(:first-child):before,html.writer-html5 .rst-content dl.footnote>dt>span.fn-backref>a:not(:first-child):before{content:" "}html.writer-html5 .rst-content dl.citation>dd,html.writer-html5 .rst-content dl.footnote>dd{margin:0 0 .5rem;line-height:1.2rem}html.writer-html5 .rst-content dl.citation>dd p,html.writer-html5 .rst-content dl.footnote>dd p{font-size:.9rem}html.writer-html5 .rst-content aside.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content div.citation{padding-left:1rem;padding-right:1rem;font-size:.9rem;line-height:1.2rem}html.writer-html5 .rst-content aside.citation p,html.writer-html5 .rst-content aside.footnote p,html.writer-html5 .rst-content div.citation p{font-size:.9rem;line-height:1.2rem;margin-bottom:12px}html.writer-html5 .rst-content aside.citation span.backrefs,html.writer-html5 .rst-content aside.footnote span.backrefs,html.writer-html5 .rst-content div.citation span.backrefs{text-align:left;font-style:italic;margin-left:.65rem;word-break:break-word;word-spacing:-.1rem;max-width:5rem}html.writer-html5 .rst-content aside.citation span.backrefs>a,html.writer-html5 .rst-content aside.footnote span.backrefs>a,html.writer-html5 .rst-content div.citation span.backrefs>a{word-break:keep-all}html.writer-html5 .rst-content aside.citation span.backrefs>a:not(:first-child):before,html.writer-html5 .rst-content aside.footnote span.backrefs>a:not(:first-child):before,html.writer-html5 .rst-content div.citation span.backrefs>a:not(:first-child):before{content:" "}html.writer-html5 .rst-content aside.citation span.label,html.writer-html5 .rst-content aside.footnote span.label,html.writer-html5 .rst-content div.citation span.label{line-height:1.2rem}html.writer-html5 .rst-content aside.citation-list,html.writer-html5 .rst-content aside.footnote-list,html.writer-html5 .rst-content div.citation-list{margin-bottom:24px}html.writer-html5 .rst-content dl.option-list kbd{font-size:.9rem}.rst-content table.docutils.footnote,html.writer-html4 .rst-content table.docutils.citation,html.writer-html5 .rst-content aside.footnote,html.writer-html5 .rst-content aside.footnote-list aside.footnote,html.writer-html5 .rst-content div.citation-list>div.citation,html.writer-html5 .rst-content dl.citation,html.writer-html5 .rst-content dl.footnote{color:grey}.rst-content table.docutils.footnote code,.rst-content table.docutils.footnote tt,html.writer-html4 .rst-content table.docutils.citation code,html.writer-html4 .rst-content table.docutils.citation tt,html.writer-html5 .rst-content aside.footnote-list aside.footnote code,html.writer-html5 .rst-content aside.footnote-list aside.footnote tt,html.writer-html5 .rst-content aside.footnote code,html.writer-html5 .rst-content aside.footnote tt,html.writer-html5 .rst-content div.citation-list>div.citation code,html.writer-html5 .rst-content div.citation-list>div.citation tt,html.writer-html5 .rst-content dl.citation code,html.writer-html5 .rst-content dl.citation tt,html.writer-html5 .rst-content dl.footnote code,html.writer-html5 .rst-content dl.footnote tt{color:#555}.rst-content .wy-table-responsive.citation,.rst-content .wy-table-responsive.footnote{margin-bottom:0}.rst-content .wy-table-responsive.citation+:not(.citation),.rst-content .wy-table-responsive.footnote+:not(.footnote){margin-top:24px}.rst-content .wy-table-responsive.citation:last-child,.rst-content .wy-table-responsive.footnote:last-child{margin-bottom:24px}.rst-content table.docutils th{border-color:#e1e4e5}html.writer-html5 .rst-content table.docutils th{border:1px solid #e1e4e5}html.writer-html5 .rst-content table.docutils td>p,html.writer-html5 .rst-content table.docutils th>p{line-height:1rem;margin-bottom:0;font-size:.9rem}.rst-content table.docutils td .last,.rst-content table.docutils td .last>:last-child{margin-bottom:0}.rst-content table.field-list,.rst-content table.field-list td{border:none}.rst-content table.field-list td p{line-height:inherit}.rst-content table.field-list td>strong{display:inline-block}.rst-content table.field-list .field-name{padding-right:10px;text-align:left;white-space:nowrap}.rst-content table.field-list .field-body{text-align:left}.rst-content code,.rst-content tt{color:#000;font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;padding:2px 5px}.rst-content code big,.rst-content code em,.rst-content tt big,.rst-content tt em{font-size:100%!important;line-height:normal}.rst-content code.literal,.rst-content tt.literal{color:#e74c3c;white-space:normal}.rst-content code.xref,.rst-content tt.xref,a .rst-content code,a .rst-content tt{font-weight:700;color:#404040;overflow-wrap:normal}.rst-content kbd,.rst-content pre,.rst-content samp{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace}.rst-content a code,.rst-content a tt{color:#2980b9}.rst-content dl{margin-bottom:24px}.rst-content dl dt{font-weight:700;margin-bottom:12px}.rst-content dl ol,.rst-content dl p,.rst-content dl table,.rst-content dl ul{margin-bottom:12px}.rst-content dl dd{margin:0 0 12px 24px;line-height:24px}.rst-content dl dd>ol:last-child,.rst-content dl dd>p:last-child,.rst-content dl dd>table:last-child,.rst-content dl dd>ul:last-child{margin-bottom:0}html.writer-html4 .rst-content dl:not(.docutils),html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple){margin-bottom:24px}html.writer-html4 .rst-content dl:not(.docutils)>dt,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt{display:table;margin:6px 0;font-size:90%;line-height:normal;background:#e7f2fa;color:#2980b9;border-top:3px solid #6ab0de;padding:6px;position:relative}html.writer-html4 .rst-content dl:not(.docutils)>dt:before,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt:before{color:#6ab0de}html.writer-html4 .rst-content dl:not(.docutils)>dt .headerlink,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink{color:#404040;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt{margin-bottom:6px;border:none;border-left:3px solid #ccc;background:#f0f0f0;color:#555}html.writer-html4 .rst-content dl:not(.docutils) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) dl:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt .headerlink{color:#404040;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils)>dt:first-child,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple)>dt:first-child{margin-top:0}html.writer-html4 .rst-content dl:not(.docutils) code.descclassname,html.writer-html4 .rst-content dl:not(.docutils) code.descname,html.writer-html4 .rst-content dl:not(.docutils) tt.descclassname,html.writer-html4 .rst-content dl:not(.docutils) tt.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descname{background-color:transparent;border:none;padding:0;font-size:100%!important}html.writer-html4 .rst-content dl:not(.docutils) code.descname,html.writer-html4 .rst-content dl:not(.docutils) tt.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) code.descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) tt.descname{font-weight:700}html.writer-html4 .rst-content dl:not(.docutils) .optional,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .optional{display:inline-block;padding:0 4px;color:#000;font-weight:700}html.writer-html4 .rst-content dl:not(.docutils) .property,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .property{display:inline-block;padding-right:8px;max-width:100%}html.writer-html4 .rst-content dl:not(.docutils) .k,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .k{font-style:italic}html.writer-html4 .rst-content dl:not(.docutils) .descclassname,html.writer-html4 .rst-content dl:not(.docutils) .descname,html.writer-html4 .rst-content dl:not(.docutils) .sig-name,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .descclassname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .descname,html.writer-html5 .rst-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.citation):not(.glossary):not(.simple) .sig-name{font-family:SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,Courier,monospace;color:#000}.rst-content .viewcode-back,.rst-content .viewcode-link{display:inline-block;color:#27ae60;font-size:80%;padding-left:24px}.rst-content .viewcode-back{display:block;float:right}.rst-content p.rubric{margin-bottom:12px;font-weight:700}.rst-content code.download,.rst-content tt.download{background:inherit;padding:inherit;font-weight:400;font-family:inherit;font-size:inherit;color:inherit;border:inherit;white-space:inherit}.rst-content code.download span:first-child,.rst-content tt.download span:first-child{-webkit-font-smoothing:subpixel-antialiased}.rst-content code.download span:first-child:before,.rst-content tt.download span:first-child:before{margin-right:4px}.rst-content .guilabel{border:1px solid #7fbbe3;background:#e7f2fa;font-size:80%;font-weight:700;border-radius:4px;padding:2.4px 6px;margin:auto 2px}.rst-content :not(dl.option-list)>:not(dt):not(kbd):not(.kbd)>.kbd,.rst-content :not(dl.option-list)>:not(dt):not(kbd):not(.kbd)>kbd{color:inherit;font-size:80%;background-color:#fff;border:1px solid #a6a6a6;border-radius:4px;box-shadow:0 2px grey;padding:2.4px 6px;margin:auto 0}.rst-content .versionmodified{font-style:italic}@media screen and (max-width:480px){.rst-content .sidebar{width:100%}}span[id*=MathJax-Span]{color:#404040}.math{text-align:center}@font-face{font-family:Lato;src:url(fonts/lato-normal.woff2?bd03a2cc277bbbc338d464e679fe9942) format("woff2"),url(fonts/lato-normal.woff?27bd77b9162d388cb8d4c4217c7c5e2a) format("woff");font-weight:400;font-style:normal;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-bold.woff2?cccb897485813c7c256901dbca54ecf2) format("woff2"),url(fonts/lato-bold.woff?d878b6c29b10beca227e9eef4246111b) format("woff");font-weight:700;font-style:normal;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-bold-italic.woff2?0b6bb6725576b072c5d0b02ecdd1900d) format("woff2"),url(fonts/lato-bold-italic.woff?9c7e4e9eb485b4a121c760e61bc3707c) format("woff");font-weight:700;font-style:italic;font-display:block}@font-face{font-family:Lato;src:url(fonts/lato-normal-italic.woff2?4eb103b4d12be57cb1d040ed5e162e9d) format("woff2"),url(fonts/lato-normal-italic.woff?f28f2d6482446544ef1ea1ccc6dd5892) format("woff");font-weight:400;font-style:italic;font-display:block}@font-face{font-family:Roboto Slab;font-style:normal;font-weight:400;src:url(fonts/Roboto-Slab-Regular.woff2?7abf5b8d04d26a2cafea937019bca958) format("woff2"),url(fonts/Roboto-Slab-Regular.woff?c1be9284088d487c5e3ff0a10a92e58c) format("woff");font-display:block}@font-face{font-family:Roboto Slab;font-style:normal;font-weight:700;src:url(fonts/Roboto-Slab-Bold.woff2?9984f4a9bda09be08e83f2506954adbe) format("woff2"),url(fonts/Roboto-Slab-Bold.woff?bed5564a116b05148e3b3bea6fb1162a) format("woff");font-display:block} diff --git a/css/theme_extra.css b/css/theme_extra.css new file mode 100644 index 00000000..ab0631a1 --- /dev/null +++ b/css/theme_extra.css @@ -0,0 +1,197 @@ +/* + * Wrap inline code samples otherwise they shoot of the side and + * can't be read at all. + * + * https://github.com/mkdocs/mkdocs/issues/313 + * https://github.com/mkdocs/mkdocs/issues/233 + * https://github.com/mkdocs/mkdocs/issues/834 + */ +.rst-content code { + white-space: pre-wrap; + word-wrap: break-word; + padding: 2px 5px; +} + +/** + * Make code blocks display as blocks and give them the appropriate + * font size and padding. + * + * https://github.com/mkdocs/mkdocs/issues/855 + * https://github.com/mkdocs/mkdocs/issues/834 + * https://github.com/mkdocs/mkdocs/issues/233 + */ +.rst-content pre code { + white-space: pre; + word-wrap: normal; + display: block; + padding: 12px; + font-size: 12px; +} + +/** + * Fix code colors + * + * https://github.com/mkdocs/mkdocs/issues/2027 + */ +.rst-content code { + color: #E74C3C; +} + +.rst-content pre code { + color: #000; + background: #f8f8f8; +} + +/* + * Fix link colors when the link text is inline code. + * + * https://github.com/mkdocs/mkdocs/issues/718 + */ +a code { + color: #2980B9; +} +a:hover code { + color: #3091d1; +} +a:visited code { + color: #9B59B6; +} + +/* + * The CSS classes from highlight.js seem to clash with the + * ReadTheDocs theme causing some code to be incorrectly made + * bold and italic. + * + * https://github.com/mkdocs/mkdocs/issues/411 + */ +pre .cs, pre .c { + font-weight: inherit; + font-style: inherit; +} + +/* + * Fix some issues with the theme and non-highlighted code + * samples. Without and highlighting styles attached the + * formatting is broken. + * + * https://github.com/mkdocs/mkdocs/issues/319 + */ +.rst-content .no-highlight { + display: block; + padding: 0.5em; + color: #333; +} + + +/* + * Additions specific to the search functionality provided by MkDocs + */ + +.search-results { + margin-top: 23px; +} + +.search-results article { + border-top: 1px solid #E1E4E5; + padding-top: 24px; +} + +.search-results article:first-child { + border-top: none; +} + +form .search-query { + width: 100%; + border-radius: 50px; + padding: 6px 12px; + border-color: #D1D4D5; +} + +/* + * Improve inline code blocks within admonitions. + * + * https://github.com/mkdocs/mkdocs/issues/656 + */ + .rst-content .admonition code { + color: #404040; + border: 1px solid #c7c9cb; + border: 1px solid rgba(0, 0, 0, 0.2); + background: #f8fbfd; + background: rgba(255, 255, 255, 0.7); +} + +/* + * Account for wide tables which go off the side. + * Override borders to avoid weirdness on narrow tables. + * + * https://github.com/mkdocs/mkdocs/issues/834 + * https://github.com/mkdocs/mkdocs/pull/1034 + */ +.rst-content .section .docutils { + width: 100%; + overflow: auto; + display: block; + border: none; +} + +td, th { + border: 1px solid #e1e4e5 !important; + border-collapse: collapse; +} + +/* + * Without the following amendments, the navigation in the theme will be + * slightly cut off. This is due to the fact that the .wy-nav-side has a + * padding-bottom of 2em, which must not necessarily align with the font-size of + * 90 % on the .rst-current-version container, combined with the padding of 12px + * above and below. These amendments fix this in two steps: First, make sure the + * .rst-current-version container has a fixed height of 40px, achieved using + * line-height, and then applying a padding-bottom of 40px to this container. In + * a second step, the items within that container are re-aligned using flexbox. + * + * https://github.com/mkdocs/mkdocs/issues/2012 + */ + .wy-nav-side { + padding-bottom: 40px; +} + +/* For section-index only */ +.wy-menu-vertical .current-section p { + background-color: #e3e3e3; + color: #404040; +} + +/* + * The second step of above amendment: Here we make sure the items are aligned + * correctly within the .rst-current-version container. Using flexbox, we + * achieve it in such a way that it will look like the following: + * + * [No repo_name] + * Next >> // On the first page + * << Previous Next >> // On all subsequent pages + * + * [With repo_name] + * Next >> // On the first page + * << Previous Next >> // On all subsequent pages + * + * https://github.com/mkdocs/mkdocs/issues/2012 + */ +.rst-versions .rst-current-version { + padding: 0 12px; + display: flex; + font-size: initial; + justify-content: space-between; + align-items: center; + line-height: 40px; +} + +/* + * Please note that this amendment also involves removing certain inline-styles + * from the file ./mkdocs/themes/readthedocs/versions.html. + * + * https://github.com/mkdocs/mkdocs/issues/2012 + */ +.rst-current-version span { + flex: 1; + text-align: center; +} diff --git a/dataflow/execution/index.html b/dataflow/execution/index.html new file mode 100644 index 00000000..aad47198 --- /dev/null +++ b/dataflow/execution/index.html @@ -0,0 +1,1756 @@ + + + + + + + + Execution - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Execution

+

The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process.

+

In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step.

+

ExecuteFlow

+

The ExecuteFlow class is designed to facilitate the execution and evaluation of tasks in a Windows application environment. It provides functionality to interact with the application's UI, execute predefined tasks, capture screenshots, and evaluate the results of the execution. The class also handles logging and error management for the tasks.

+

Task Execution

+

The task execution in the ExecuteFlow class follows a structured sequence to ensure accurate and traceable task performance:

+
    +
  1. Initialization:
  2. +
  3. Load configuration settings and log paths.
  4. +
  5. Find the application window matching the task.
  6. +
  7. +

    Retrieve or create an ExecuteAgent for executing the task.

    +
  8. +
  9. +

    Plan Execution:

    +
  10. +
  11. Loop through each step in the instantiated_plan.
  12. +
  13. +

    Parse the step to extract information like subtasks, control text, and the required operation.

    +
  14. +
  15. +

    Action Execution:

    +
  16. +
  17. Find the control in the application window that matches the specified control text.
  18. +
  19. If no matching control is found, raise an error.
  20. +
  21. Perform the specified action (e.g., click, input text) using the agent's Puppeteer framework.
  22. +
  23. +

    Capture screenshots of the application window and selected controls for logging and debugging.

    +
  24. +
  25. +

    Result Logging:

    +
  26. +
  27. +

    Log details of the step execution, including control information, performed action, and results.

    +
  28. +
  29. +

    Finalization:

    +
  30. +
  31. Save the final state of the application window.
  32. +
  33. Quit the application client gracefully.
  34. +
+

Input of ExecuteAgent

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterTypeDescription
namestrThe name of the agent. Used for identification and logging purposes.
process_namestrThe name of the application process that the agent interacts with.
app_root_namestrThe name of the root application window or main UI component being targeted.
---
+

Evaluation

+

The evaluation process in the ExecuteFlow class is designed to assess the performance of the executed task based on predefined prompts:

+
    +
  1. Start Evaluation:
  2. +
  3. Evaluation begins immediately after task execution.
  4. +
  5. +

    It uses an ExecuteEvalAgent initialized during class construction.

    +
  6. +
  7. +

    Perform Evaluation:

    +
  8. +
  9. The ExecuteEvalAgent evaluates the task using a combination of input prompts (e.g., main prompt and API prompt) and logs generated during task execution.
  10. +
  11. +

    The evaluation process outputs a result summary (e.g., quality flag, comments, and task type).

    +
  12. +
  13. +

    Log and Output Results:

    +
  14. +
  15. Display the evaluation results in the console.
  16. +
  17. Return the evaluation summary alongside the executed plan for further analysis or reporting.
  18. +
+

Reference

+

ExecuteFlow

+ + +
+ + + + +
+

+ Bases: AppAgentProcessor

+ + +

ExecuteFlow class for executing the task and saving the result.

+ +

Initialize the execute flow for a task.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + task_file_name + (str) + – +
    +

    Name of the task file being processed.

    +
    +
  • +
  • + context + (Context) + – +
    +

    Context object for the current session.

    +
    +
  • +
  • + environment + (WindowsAppEnv) + – +
    +

    Environment object for the application being processed.

    +
    +
  • +
+
+ + + + + +
+ Source code in execution/workflow/execute_flow.py +
30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
def __init__(
+    self, task_file_name: str, context: Context, environment: WindowsAppEnv
+) -> None:
+    """
+    Initialize the execute flow for a task.
+    :param task_file_name: Name of the task file being processed.
+    :param context: Context object for the current session.
+    :param environment: Environment object for the application being processed.
+    """
+
+    super().__init__(agent=ExecuteAgent, context=context)
+
+    self.execution_time = None
+    self.eval_time = None
+    self._app_env = environment
+    self._task_file_name = task_file_name
+    self._app_name = self._app_env.app_name
+
+    log_path = _configs["EXECUTE_LOG_PATH"].format(task=task_file_name)
+    self._initialize_logs(log_path)
+
+    self.application_window = self._app_env.find_matching_window(task_file_name)
+    self.app_agent = self._get_or_create_execute_agent()
+    self.eval_agent = self._get_or_create_evaluation_agent()
+
+    self._matched_control = None  # Matched control for the current step.
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ execute(request, instantiated_plan) + +

+ + +
+ +

Execute the execute flow: Execute the task and save the result.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    Original request to be executed.

    +
    +
  • +
  • + instantiated_plan + (List[Dict[str, Any]]) + – +
    +

    Instantiated plan containing steps to execute.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Tuple[List[Dict[str, Any]], Dict[str, str]] + – +
    +

    Tuple containing task quality flag, comment, and task type.

    +
    +
  • +
+
+
+ Source code in execution/workflow/execute_flow.py +
101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
def execute(
+    self, request: str, instantiated_plan: List[Dict[str, Any]]
+) -> Tuple[List[Dict[str, Any]], Dict[str, str]]:
+    """
+    Execute the execute flow: Execute the task and save the result.
+    :param request: Original request to be executed.
+    :param instantiated_plan: Instantiated plan containing steps to execute.
+    :return: Tuple containing task quality flag, comment, and task type.
+    """
+
+    start_time = time.time()
+    try:
+        executed_plan = self.execute_plan(instantiated_plan)
+    except Exception as error:
+        raise RuntimeError(f"Execution failed. {error}")
+    finally:
+        self.execution_time = round(time.time() - start_time, 3)
+
+    start_time = time.time()
+    try:
+        result, _ = self.eval_agent.evaluate(
+            request=request, log_path=self.log_path
+        )
+        utils.print_with_color(f"Result: {result}", "green")
+    except Exception as error:
+        raise RuntimeError(f"Evaluation failed. {error}")
+    finally:
+        self.eval_time = round(time.time() - start_time, 3)
+
+    return executed_plan, result
+
+
+
+ +
+ +
+ + +

+ execute_action() + +

+ + +
+ +

Execute the action.

+ +
+ Source code in execution/workflow/execute_flow.py +
306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+317
+318
+319
+320
+321
+322
+323
+324
+325
+326
+327
+328
+329
+330
+331
+332
+333
+334
+335
+336
+337
+338
+339
+340
+341
+342
+343
+344
+345
+346
+347
+348
+349
+350
+351
+352
+353
+354
+355
+356
+357
+358
+359
+360
+361
+362
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
def execute_action(self) -> None:
+    """
+    Execute the action.
+    """
+
+    control_selected = None
+    # Find the matching window and control.
+    self.application_window = self._app_env.find_matching_window(
+        self._task_file_name
+    )
+    if self.control_text == "":
+        control_selected = self.application_window
+    else:
+        self._control_label, control_selected = (
+            self._app_env.find_matching_controller(
+                self.filtered_annotation_dict, self.control_text
+            )
+        )
+        self._matched_control = control_selected.window_text()
+
+    if not control_selected:
+        # If the control is not found, raise an error.
+        raise RuntimeError(f"Control with text '{self.control_text}' not found.")
+
+    try:
+        # Get the selected control item from the annotation dictionary and LLM response.
+        # The LLM response is a number index corresponding to the key in the annotation dictionary.
+        if control_selected:
+
+            if _ufo_configs.get("SHOW_VISUAL_OUTLINE_ON_SCREEN", True):
+                control_selected.draw_outline(colour="red", thickness=3)
+                time.sleep(_ufo_configs.get("RECTANGLE_TIME", 0))
+
+            control_coordinates = PhotographerDecorator.coordinate_adjusted(
+                self.application_window.rectangle(), control_selected.rectangle()
+            )
+
+            self._control_log = {
+                "control_class": control_selected.element_info.class_name,
+                "control_type": control_selected.element_info.control_type,
+                "control_automation_id": control_selected.element_info.automation_id,
+                "control_friendly_class_name": control_selected.friendly_class_name(),
+                "control_coordinates": {
+                    "left": control_coordinates[0],
+                    "top": control_coordinates[1],
+                    "right": control_coordinates[2],
+                    "bottom": control_coordinates[3],
+                },
+            }
+
+            self.app_agent.Puppeteer.receiver_manager.create_ui_control_receiver(
+                control_selected, self.application_window
+            )
+
+            # Save the screenshot of the tagged selected control.
+            self.capture_control_screenshot(control_selected)
+
+            self._results = self.app_agent.Puppeteer.execute_command(
+                self._operation, self._args
+            )
+            self.control_reannotate = None
+            if not utils.is_json_serializable(self._results):
+                self._results = ""
+
+                return
+
+    except Exception:
+        self.general_error_handler()
+
+
+
+ +
+ +
+ + +

+ execute_plan(instantiated_plan) + +

+ + +
+ +

Get the executed result from the execute agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + instantiated_plan + (List[Dict[str, Any]]) + – +
    +

    Plan containing steps to execute.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, Any]] + – +
    +

    List of executed steps.

    +
    +
  • +
+
+
+ Source code in execution/workflow/execute_flow.py +
132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
def execute_plan(
+    self, instantiated_plan: List[Dict[str, Any]]
+) -> List[Dict[str, Any]]:
+    """
+    Get the executed result from the execute agent.
+    :param instantiated_plan: Plan containing steps to execute.
+    :return: List of executed steps.
+    """
+
+    # Initialize the step counter and capture the initial screenshot.
+    self.session_step = 0
+    try:
+        time.sleep(1)
+        # Initialize the API receiver
+        self.app_agent.Puppeteer.receiver_manager.create_api_receiver(
+            self.app_agent._app_root_name, self.app_agent._process_name
+        )
+        # Initialize the control receiver
+        current_receiver = self.app_agent.Puppeteer.receiver_manager.receiver_list[
+            -1
+        ]
+
+        if current_receiver is not None:
+            self.application_window = self._app_env.find_matching_window(
+                self._task_file_name
+            )
+            current_receiver.com_object = (
+                current_receiver.get_object_from_process_name()
+            )
+
+        self.init_and_final_capture_screenshot()
+    except Exception as error:
+        raise RuntimeError(f"Execution initialization failed. {error}")
+
+    # Initialize the success flag for each step.
+    for index, step_plan in enumerate(instantiated_plan):
+        instantiated_plan[index]["Success"] = None
+        instantiated_plan[index]["MatchedControlText"] = None
+
+    for index, step_plan in enumerate(instantiated_plan):
+        try:
+            self.session_step += 1
+
+            # Check if the maximum steps have been exceeded.
+            if self.session_step > _configs["MAX_STEPS"]:
+                raise RuntimeError("Maximum steps exceeded.")
+
+            self._parse_step_plan(step_plan)
+
+            try:
+                self.process()
+                instantiated_plan[index]["Success"] = True
+                instantiated_plan[index]["ControlLabel"] = self._control_label
+                instantiated_plan[index][
+                    "MatchedControlText"
+                ] = self._matched_control
+            except Exception as ControllerNotFoundError:
+                instantiated_plan[index]["Success"] = False
+                raise ControllerNotFoundError
+
+        except Exception as error:
+            err_info = RuntimeError(
+                f"Step {self.session_step} execution failed. {error}"
+            )
+            raise err_info
+    # capture the final screenshot
+    self.session_step += 1
+    time.sleep(1)
+    self.init_and_final_capture_screenshot()
+    # save the final state of the app
+
+    win_com_receiver = None
+    for receiver in reversed(
+        self.app_agent.Puppeteer.receiver_manager.receiver_list
+    ):
+        if isinstance(receiver, WinCOMReceiverBasic):
+            if receiver.client is not None:
+                win_com_receiver = receiver
+                break
+
+    if win_com_receiver is not None:
+        win_com_receiver.save()
+        time.sleep(1)
+        win_com_receiver.client.Quit()
+
+    print("Execution complete.")
+
+    return instantiated_plan
+
+
+
+ +
+ +
+ + +

+ general_error_handler() + +

+ + +
+ +

Handle general errors.

+ +
+ Source code in execution/workflow/execute_flow.py +
375
+376
+377
+378
+379
+380
def general_error_handler(self) -> None:
+    """
+    Handle general errors.
+    """
+
+    pass
+
+
+
+ +
+ +
+ + +

+ init_and_final_capture_screenshot() + +

+ + +
+ +

Capture the screenshot.

+ +
+ Source code in execution/workflow/execute_flow.py +
285
+286
+287
+288
+289
+290
+291
+292
+293
+294
+295
+296
+297
+298
+299
+300
+301
+302
+303
+304
def init_and_final_capture_screenshot(self) -> None:
+    """
+    Capture the screenshot.
+    """
+
+    # Define the paths for the screenshots saved.
+    screenshot_save_path = self.log_path + f"action_step{self.session_step}.png"
+
+    self._memory_data.add_values_from_dict(
+        {
+            "CleanScreenshot": screenshot_save_path,
+        }
+    )
+
+    self.photographer.capture_app_window_screenshot(
+        self.application_window, save_path=screenshot_save_path
+    )
+    # Capture the control screenshot.
+    control_selected = self._app_env.app_window
+    self.capture_control_screenshot(control_selected)
+
+
+
+ +
+ +
+ + +

+ log_save() + +

+ + +
+ +

Log the constructed prompt message for the PrefillAgent.

+ +
+ Source code in execution/workflow/execute_flow.py +
246
+247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
def log_save(self) -> None:
+    """
+    Log the constructed prompt message for the PrefillAgent.
+    """
+
+    step_memory = {
+        "Step": self.session_step,
+        "Subtask": self.subtask,
+        "ControlLabel": self._control_label,
+        "ControlText": self.control_text,
+        "Action": self.action,
+        "ActionType": self.app_agent.Puppeteer.get_command_types(self._operation),
+        "Results": self._results,
+        "Application": self.app_agent._app_root_name,
+        "TimeCost": self.time_cost,
+    }
+    self._memory_data.add_values_from_dict(step_memory)
+    self.log(self._memory_data.to_dict())
+
+
+
+ +
+ +
+ + +

+ print_step_info() + +

+ + +
+ +

Print the step information.

+ +
+ Source code in execution/workflow/execute_flow.py +
233
+234
+235
+236
+237
+238
+239
+240
+241
+242
+243
+244
def print_step_info(self) -> None:
+    """
+    Print the step information.
+    """
+
+    utils.print_with_color(
+        "Step {step}: {subtask}".format(
+            step=self.session_step,
+            subtask=self.subtask,
+        ),
+        "magenta",
+    )
+
+
+
+ +
+ +
+ + +

+ process() + +

+ + +
+ +

Process the current step.

+ +
+ Source code in execution/workflow/execute_flow.py +
221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
def process(self) -> None:
+    """
+    Process the current step.
+    """
+
+    step_start_time = time.time()
+    self.print_step_info()
+    self.capture_screenshot()
+    self.execute_action()
+    self.time_cost = round(time.time() - step_start_time, 3)
+    self.log_save()
+
+
+
+ +
+ + + +
+ +
+ +

ExecuteAgent

+ + +
+ + + + +
+

+ Bases: AppAgent

+ + +

The Agent for task execution.

+ +

Initialize the ExecuteAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + name + (str) + – +
    +

    The name of the agent.

    +
    +
  • +
  • + process_name + (str) + – +
    +

    The name of the process.

    +
    +
  • +
  • + app_root_name + (str) + – +
    +

    The name of the app root.

    +
    +
  • +
+
+ + + + + +
+ Source code in execution/agent/execute_agent.py +
12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
def __init__(
+    self,
+    name: str,
+    process_name: str,
+    app_root_name: str,
+):
+    """
+    Initialize the ExecuteAgent.
+    :param name: The name of the agent.
+    :param process_name: The name of the process.
+    :param app_root_name: The name of the app root.
+    """
+
+    self._step = 0
+    self._complete = False
+    self._name = name
+    self._status = None
+    self._process_name = process_name
+    self._app_root_name = app_root_name
+    self.Puppeteer = self.create_puppeteer_interface()
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +

ExecuteEvalAgent

+ + +
+ + + + +
+

+ Bases: EvaluationAgent

+ + +

The Agent for task execution evaluation.

+ +

Initialize the ExecuteEvalAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + name + (str) + – +
    +

    The name of the agent.

    +
    +
  • +
  • + app_root_name + (str) + – +
    +

    The name of the app root.

    +
    +
  • +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt.

    +
    +
  • +
+
+ + + + + +
+ Source code in execution/agent/execute_eval_agent.py +
14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
def __init__(
+    self,
+    name: str,
+    app_root_name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+):
+    """
+    Initialize the ExecuteEvalAgent.
+    :param name: The name of the agent.
+    :param app_root_name: The name of the app root.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt.
+    :param example_prompt: The example prompt.
+    :param api_prompt: The API prompt.
+    """
+
+    super().__init__(
+        name=name,
+        app_root_name=app_root_name,
+        is_visual=is_visual,
+        main_prompt=main_prompt,
+        example_prompt=example_prompt,
+        api_prompt=api_prompt,
+    )
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_prompter(is_visual, prompt_template, example_prompt_template, api_prompt_template, root_name=None) + +

+ + +
+ +

Get the prompter for the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + prompt_template + (str) + – +
    +

    The prompt template.

    +
    +
  • +
  • + example_prompt_template + (str) + – +
    +

    The example prompt template.

    +
    +
  • +
  • + api_prompt_template + (str) + – +
    +

    The API prompt template.

    +
    +
  • +
  • + root_name + (Optional[str], default: + None +) + – +
    +

    The name of the root.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + ExecuteEvalAgentPrompter + – +
    +

    The prompter.

    +
    +
  • +
+
+
+ Source code in execution/agent/execute_eval_agent.py +
42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
def get_prompter(
+    self,
+    is_visual: bool,
+    prompt_template: str,
+    example_prompt_template: str,
+    api_prompt_template: str,
+    root_name: Optional[str] = None,
+) -> ExecuteEvalAgentPrompter:
+    """
+    Get the prompter for the agent.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param prompt_template: The prompt template.
+    :param example_prompt_template: The example prompt template.
+    :param api_prompt_template: The API prompt template.
+    :param root_name: The name of the root.
+    :return: The prompter.
+    """
+
+    return ExecuteEvalAgentPrompter(
+        is_visual=is_visual,
+        prompt_template=prompt_template,
+        example_prompt_template=example_prompt_template,
+        api_prompt_template=api_prompt_template,
+        root_name=root_name,
+    )
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/dataflow/instantiation/index.html b/dataflow/instantiation/index.html new file mode 100644 index 00000000..7ad9760a --- /dev/null +++ b/dataflow/instantiation/index.html @@ -0,0 +1,2003 @@ + + + + + + + + Instantiation - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Instantiation

+

There are three key steps in the instantiation process:

+
    +
  1. Choose a template file according to the specified app and instruction.
  2. +
  3. Prefill the task using the current screenshot.
  4. +
  5. Filter the established task.
  6. +
+

Given the initial task, the dataflow first choose a template (Phase 1), the prefill the initial task based on word envrionment to obtain task-action data (Phase 2). Finnally, it will filter the established task to evaluate the quality of task-action data.

+

+ +

+ +

1. Choose Template File

+

Templates for your app must be defined and described in dataflow/templates/app. For instance, if you want to instantiate tasks for the Word application, place the relevant .docx files in dataflow /templates/word, along with a description.json file. The appropriate template will be selected based on how well its description matches the instruction.

+

The ChooseTemplateFlow uses semantic matching, where task descriptions are compared with template descriptions using embeddings and FAISS for efficient nearest neighbor search. If semantic matching fails, a random template is chosen from the available files.

+

ChooseTemplateFlow

+ + +
+ + + + +
+ + +

Class to select and copy the most relevant template file based on the given task context.

+ +

Initialize the flow with the given task context.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + app_name + (str) + – +
    +

    The name of the application.

    +
    +
  • +
  • + file_extension + (str) + – +
    +

    The file extension of the template.

    +
    +
  • +
  • + task_file_name + (str) + – +
    +

    The name of the task file.

    +
    +
  • +
+
+ + + + + +
+ Source code in instantiation/workflow/choose_template_flow.py +
27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
def __init__(self, app_name: str, task_file_name: str, file_extension: str):
+    """
+    Initialize the flow with the given task context.
+    :param app_name: The name of the application.
+    :param file_extension: The file extension of the template.
+    :param task_file_name: The name of the task file.
+    """
+
+    self._app_name = app_name
+    self._file_extension = file_extension
+    self._task_file_name = task_file_name
+    self.execution_time = None
+    self._embedding_model = self._load_embedding_model(
+        model_name=_configs["CONTROL_FILTER_MODEL_SEMANTIC_NAME"]
+    )
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ execute() + +

+ + +
+ +

Execute the flow and return the copied template path.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The path to the copied template file.

    +
    +
  • +
+
+
+ Source code in instantiation/workflow/choose_template_flow.py +
43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
def execute(self) -> str:
+    """
+    Execute the flow and return the copied template path.
+    :return: The path to the copied template file.
+    """
+
+    start_time = time.time()
+    try:
+        template_copied_path = self._choose_template_and_copy()
+    except Exception as e:
+        raise e
+    finally:
+        self.execution_time = round(time.time() - start_time, 3)
+    return template_copied_path
+
+
+
+ +
+ + + +
+ +
+ +


+

2. Prefill the Task

+

The PrefillFlow class orchestrates the refinement of task plans and UI interactions by leveraging PrefillAgent for task planning and action generation. It automates UI control updates, captures screenshots, and manages logs for messages and responses during execution.

+

PrefillFlow

+ + +
+ + + + +
+

+ Bases: AppAgentProcessor

+ + +

Class to manage the prefill process by refining planning steps and automating UI interactions

+ +

Initialize the prefill flow with the application context.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + app_name + (str) + – +
    +

    The name of the application.

    +
    +
  • +
  • + task_file_name + (str) + – +
    +

    The name of the task file for logging and tracking.

    +
    +
  • +
  • + environment + (WindowsAppEnv) + – +
    +

    The environment of the app.

    +
    +
  • +
+
+ + + + + +
+ Source code in instantiation/workflow/prefill_flow.py +
29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
def __init__(
+    self,
+    app_name: str,
+    task_file_name: str,
+    environment: WindowsAppEnv,
+) -> None:
+    """
+    Initialize the prefill flow with the application context.
+    :param app_name: The name of the application.
+    :param task_file_name: The name of the task file for logging and tracking.
+    :param environment: The environment of the app.
+    """
+
+    self.execution_time = None
+    self._app_name = app_name
+    self._task_file_name = task_file_name
+    self._app_env = environment
+    # Create or reuse a PrefillAgent for the app
+    if self._app_name not in PrefillFlow._app_prefill_agent_dict:
+        PrefillFlow._app_prefill_agent_dict[self._app_name] = PrefillAgent(
+            "prefill",
+            self._app_name,
+            is_visual=True,
+            main_prompt=_configs["PREFILL_PROMPT"],
+            example_prompt=_configs["PREFILL_EXAMPLE_PROMPT"],
+            api_prompt=_configs["API_PROMPT"],
+        )
+    self._prefill_agent = PrefillFlow._app_prefill_agent_dict[self._app_name]
+
+    # Initialize execution step and UI control tools
+    self._execute_step = 0
+    self._control_inspector = ControlInspectorFacade(_BACKEND)
+    self._photographer = PhotographerFacade()
+
+    # Set default states
+    self._status = ""
+
+    # Initialize loggers for messages and responses
+    self._log_path_configs = _configs["PREFILL_LOG_PATH"].format(
+        task=self._task_file_name
+    )
+    os.makedirs(self._log_path_configs, exist_ok=True)
+
+    # Set up loggers
+    self._message_logger = BaseSession.initialize_logger(
+        self._log_path_configs, "prefill_messages.json", "w", _configs
+    )
+    self._response_logger = BaseSession.initialize_logger(
+        self._log_path_configs, "prefill_responses.json", "w", _configs
+    )
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ execute(template_copied_path, original_task, refined_steps) + +

+ + +
+ +

Start the execution by retrieving the instantiated result.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + template_copied_path + (str) + – +
    +

    The path of the copied template to use.

    +
    +
  • +
  • + original_task + (str) + – +
    +

    The original task to refine.

    +
    +
  • +
  • + refined_steps + (List[str]) + – +
    +

    The steps to guide the refinement process.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, Any] + – +
    +

    The refined task and corresponding action plans.

    +
    +
  • +
+
+
+ Source code in instantiation/workflow/prefill_flow.py +
 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
def execute(
+    self, template_copied_path: str, original_task: str, refined_steps: List[str]
+) -> Dict[str, Any]:
+    """
+    Start the execution by retrieving the instantiated result.
+    :param template_copied_path: The path of the copied template to use.
+    :param original_task: The original task to refine.
+    :param refined_steps: The steps to guide the refinement process.
+    :return: The refined task and corresponding action plans.
+    """
+
+    start_time = time.time()
+    try:
+        instantiated_request, instantiated_plan = self._instantiate_task(
+            template_copied_path, original_task, refined_steps
+        )
+    except Exception as e:
+        raise e
+    finally:
+        self.execution_time = round(time.time() - start_time, 3)
+
+    return  {
+        "instantiated_request": instantiated_request,
+        "instantiated_plan": instantiated_plan,
+    }   
+
+
+
+ +
+ + + +
+ +
+ +

PrefillAgent

+

The PrefillAgent class facilitates task instantiation and action sequence generation by constructing tailored prompt messages using the PrefillPrompter. It integrates system, user, and dynamic context to generate actionable inputs for automation workflows.

+ + +
+ + + + +
+

+ Bases: BasicAgent

+ + +

The Agent for task instantialization and action sequence generation.

+ +

Initialize the PrefillAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + name + (str) + – +
    +

    The name of the agent.

    +
    +
  • +
  • + process_name + (str) + – +
    +

    The name of the process.

    +
    +
  • +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt.

    +
    +
  • +
+
+ + + + + +
+ Source code in instantiation/agent/prefill_agent.py +
16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
def __init__(
+    self,
+    name: str,
+    process_name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+):
+    """
+    Initialize the PrefillAgent.
+    :param name: The name of the agent.
+    :param process_name: The name of the process.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt.
+    :param example_prompt: The example prompt.
+    :param api_prompt: The API prompt.
+    """
+
+    self._step = 0
+    self._complete = False
+    self._name = name
+    self._status = None
+    self.prompter: PrefillPrompter = self.get_prompter(
+        is_visual, main_prompt, example_prompt, api_prompt
+    )
+    self._process_name = process_name
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_prompter(is_visual, main_prompt, example_prompt, api_prompt) + +

+ + +
+ +

Get the prompt for the agent. +This is the abstract method from BasicAgent that needs to be implemented.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The prompt string.

    +
    +
  • +
+
+
+ Source code in instantiation/agent/prefill_agent.py +
44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
def get_prompter(self, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str) -> str:
+    """
+    Get the prompt for the agent.
+    This is the abstract method from BasicAgent that needs to be implemented.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt.
+    :param example_prompt: The example prompt.
+    :param api_prompt: The API prompt.
+    :return: The prompt string.
+    """
+
+    return PrefillPrompter(is_visual, main_prompt, example_prompt, api_prompt)
+
+
+
+ +
+ +
+ + +

+ message_constructor(dynamic_examples, given_task, reference_steps, doc_control_state, log_path) + +

+ + +
+ +

Construct the prompt message for the PrefillAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + dynamic_examples + (str) + – +
    +

    The dynamic examples retrieved from the self-demonstration and human demonstration.

    +
    +
  • +
  • + given_task + (str) + – +
    +

    The given task.

    +
    +
  • +
  • + reference_steps + (List[str]) + – +
    +

    The reference steps.

    +
    +
  • +
  • + doc_control_state + (Dict[str, str]) + – +
    +

    The document control state.

    +
    +
  • +
  • + log_path + (str) + – +
    +

    The path of the log.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The prompt message.

    +
    +
  • +
+
+
+ Source code in instantiation/agent/prefill_agent.py +
57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
def message_constructor(
+    self,
+    dynamic_examples: str,
+    given_task: str,
+    reference_steps: List[str],
+    doc_control_state: Dict[str, str],
+    log_path: str,
+) -> List[str]:
+    """
+    Construct the prompt message for the PrefillAgent.
+    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
+    :param given_task: The given task.
+    :param reference_steps: The reference steps.
+    :param doc_control_state: The document control state.
+    :param log_path: The path of the log.
+    :return: The prompt message.
+    """
+
+    prefill_agent_prompt_system_message = self.prompter.system_prompt_construction(
+        dynamic_examples
+    )
+    prefill_agent_prompt_user_message = self.prompter.user_content_construction(
+        given_task, reference_steps, doc_control_state, log_path
+    )
+    appagent_prompt_message = self.prompter.prompt_construction(
+        prefill_agent_prompt_system_message,
+        prefill_agent_prompt_user_message,
+    )
+
+    return appagent_prompt_message
+
+
+
+ +
+ +
+ + +

+ process_comfirmation() + +

+ + +
+ +

Confirm the process. +This is the abstract method from BasicAgent that needs to be implemented.

+ +
+ Source code in instantiation/agent/prefill_agent.py +
88
+89
+90
+91
+92
+93
+94
def process_comfirmation(self) -> None:
+    """
+    Confirm the process.
+    This is the abstract method from BasicAgent that needs to be implemented.
+    """
+
+    pass
+
+
+
+ +
+ + + +
+ +
+ +


+

3. Filter Task

+

The FilterFlow class is designed to process and refine task plans by leveraging a FilterAgent.

+

FilterFlow

+ + +
+ + + + +
+ + +

Class to refine the plan steps and prefill the file based on filtering criteria.

+ +

Initialize the filter flow for a task.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + app_name + (str) + – +
    +

    Name of the application being processed.

    +
    +
  • +
  • + task_file_name + (str) + – +
    +

    Name of the task file being processed.

    +
    +
  • +
+
+ + + + + +
+ Source code in instantiation/workflow/filter_flow.py +
21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
def __init__(self, app_name: str, task_file_name: str) -> None:
+    """
+    Initialize the filter flow for a task.
+    :param app_name: Name of the application being processed.
+    :param task_file_name: Name of the task file being processed.
+    """
+
+    self.execution_time = None
+    self._app_name = app_name
+    self._log_path_configs = _configs["FILTER_LOG_PATH"].format(task=task_file_name)
+    self._filter_agent = self._get_or_create_filter_agent()
+    self._initialize_logs()
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ execute(instantiated_request) + +

+ + +
+ +

Execute the filter flow: Filter the task and save the result.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + instantiated_request + (str) + – +
    +

    Request object to be filtered.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, Any] + – +
    +

    Tuple containing task quality flag, comment, and task type.

    +
    +
  • +
+
+
+ Source code in instantiation/workflow/filter_flow.py +
51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
def execute(self, instantiated_request: str) -> Dict[str, Any]:
+    """
+    Execute the filter flow: Filter the task and save the result.
+    :param instantiated_request: Request object to be filtered.
+    :return: Tuple containing task quality flag, comment, and task type.
+    """
+
+    start_time = time.time()
+    try:
+        judge, thought, request_type = self._get_filtered_result(
+            instantiated_request
+        )
+    except Exception as e:
+        raise e
+    finally:
+        self.execution_time = round(time.time() - start_time, 3)
+    return {
+        "judge": judge,
+        "thought": thought,
+        "request_type": request_type,
+    }
+
+
+
+ +
+ + + +
+ +
+ +

FilterAgent

+ + +
+ + + + +
+

+ Bases: BasicAgent

+ + +

The Agent to evaluate the instantiated task is correct or not.

+ +

Initialize the FilterAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + name + (str) + – +
    +

    The name of the agent.

    +
    +
  • +
  • + process_name + (str) + – +
    +

    The name of the process.

    +
    +
  • +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt.

    +
    +
  • +
+
+ + + + + +
+ Source code in instantiation/agent/filter_agent.py +
14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
def __init__(
+    self,
+    name: str,
+    process_name: str,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str,
+):
+    """
+    Initialize the FilterAgent.
+    :param name: The name of the agent.
+    :param process_name: The name of the process.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt.
+    :param example_prompt: The example prompt.
+    :param api_prompt: The API prompt.
+    """
+
+    self._step = 0
+    self._complete = False
+    self._name = name
+    self._status = None
+    self.prompter: FilterPrompter = self.get_prompter(
+        is_visual, main_prompt, example_prompt, api_prompt
+    )
+    self._process_name = process_name
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ get_prompter(is_visual, main_prompt, example_prompt, api_prompt) + +

+ + +
+ +

Get the prompt for the agent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + is_visual + (bool) + – +
    +

    The flag indicating whether the agent is visual or not.

    +
    +
  • +
  • + main_prompt + (str) + – +
    +

    The main prompt.

    +
    +
  • +
  • + example_prompt + (str) + – +
    +

    The example prompt.

    +
    +
  • +
  • + api_prompt + (str) + – +
    +

    The API prompt.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + FilterPrompter + – +
    +

    The prompt string.

    +
    +
  • +
+
+
+ Source code in instantiation/agent/filter_agent.py +
42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
def get_prompter(
+    self,
+    is_visual: bool,
+    main_prompt: str,
+    example_prompt: str,
+    api_prompt: str
+) -> FilterPrompter:
+    """
+    Get the prompt for the agent.
+    :param is_visual: The flag indicating whether the agent is visual or not.
+    :param main_prompt: The main prompt.
+    :param example_prompt: The example prompt.
+    :param api_prompt: The API prompt.
+    :return: The prompt string.
+    """
+
+    return FilterPrompter(is_visual, main_prompt, example_prompt, api_prompt)
+
+
+
+ +
+ +
+ + +

+ message_constructor(request, app) + +

+ + +
+ +

Construct the prompt message for the FilterAgent.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    The request sentence.

    +
    +
  • +
  • + app + (str) + – +
    +

    The name of the operated app.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + List[str] + – +
    +

    The prompt message.

    +
    +
  • +
+
+
+ Source code in instantiation/agent/filter_agent.py +
60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
def message_constructor(self, request: str, app: str) -> List[str]:
+    """
+    Construct the prompt message for the FilterAgent.
+    :param request: The request sentence.
+    :param app: The name of the operated app.
+    :return: The prompt message.
+    """
+
+    filter_agent_prompt_system_message = self.prompter.system_prompt_construction(
+        app=app
+    )
+    filter_agent_prompt_user_message = self.prompter.user_content_construction(
+        request
+    )
+    filter_agent_prompt_message = self.prompter.prompt_construction(
+        filter_agent_prompt_system_message, filter_agent_prompt_user_message
+    )
+
+    return filter_agent_prompt_message
+
+
+
+ +
+ +
+ + +

+ process_comfirmation() + +

+ + +
+ +

Confirm the process. +This is the abstract method from BasicAgent that needs to be implemented.

+ +
+ Source code in instantiation/agent/filter_agent.py +
80
+81
+82
+83
+84
+85
+86
def process_comfirmation(self) -> None:
+    """
+    Confirm the process.
+    This is the abstract method from BasicAgent that needs to be implemented.
+    """
+
+    pass
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/dataflow/overview/index.html b/dataflow/overview/index.html new file mode 100644 index 00000000..ffa91823 --- /dev/null +++ b/dataflow/overview/index.html @@ -0,0 +1,1937 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Dataflow

+

Dataflow uses UFO to implement instantiation, execution, and dataflow for a given task, with options for batch processing and single processing.

+
    +
  1. Instantiation: Instantiation refers to the process of setting up and preparing a task for execution. This step typically involves choosing template, prefill and filter.
  2. +
  3. Execution: Execution is the actual process of running the task. This step involves carrying out the actions or operations specified by the Instantiation. And after execution, an evaluate agent will evaluate the quality of the whole execution process.
  4. +
  5. Dataflow: Dataflow is the overarching process that combines instantiation and execution into a single pipeline. It provides an end-to-end solution for processing tasks, ensuring that all necessary steps (from initialization to execution) are seamlessly integrated.
  6. +
+

You can use instantiation and execution independently if you only need to perform one specific part of the process. When both steps are required for a task, the dataflow process streamlines them, allowing you to execute tasks from start to finish in a single pipeline.

+

The overall processing of dataflow is as below. Given a task-plan data, the LLMwill instantiatie the task-action data, including choosing template, prefill, filter.

+

+ +

+ +

How To Use

+

1. Install Packages

+

You should install the necessary packages in the UFO root folder:

+
pip install -r requirements.txt
+
+

2. Configure the LLMs

+

Before running dataflow, you need to provide your LLM configurations individually for PrefillAgent and FilterAgent. You can create your own config file dataflow/config/config.yaml, by copying the dataflow/config/config.yaml.template and editing config for PREFILL_AGENT and FILTER_AGENT as follows:

+

OpenAI

+
VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.  
+API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
+API_KEY: "sk-",  # The OpenAI API key, begin with sk-
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview",  # The only OpenAI model
+
+

Azure OpenAI (AOAI)

+
VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.  
+API_BASE: "YOUR_ENDPOINT", #  The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
+API_KEY: "YOUR_KEY",  # The aoai API key
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview",  # The only OpenAI model
+API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
+
+

You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False and proper API_MODEL (openai) and API_DEPLOYMENT_ID (aoai).

+

Non-Visual Model Configuration

+

You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file:

+
    +
  • VISUAL_MODE: False # To enable non-visual mode.
  • +
  • Specify the appropriate API_MODEL (OpenAI) and API_DEPLOYMENT_ID (AOAI) for each agent.
  • +
+

Ensure you configure these settings accurately to leverage non-visual models effectively.

+

Other Configurations

+

config_dev.yaml specifies the paths of relevant files and contains default settings. The match strategy for the window match and control filter supports options: 'contains', 'fuzzy', and 'regex', allowing flexible matching strategy for users. The MAX_STEPS is the max step for the execute_flow, which can be set by users.

+
+

Note

+

The specific implementation and invocation method of the matching strategy can refer to windows_app_env.

+
+
+

Note

+

BE CAREFUL! If you are using GitHub or other open-source tools, do not expose your config.yaml online, as it contains your private keys.

+
+

3. Prepare Files

+

Certain files need to be prepared before running the task.

+

3.1. Tasks as JSON

+

The tasks that need to be instantiated should be organized in a folder of JSON files, with the default folder path set to dataflow /tasks. This path can be changed in the dataflow/config/config.yaml file, or you can specify it in the terminal, as mentioned in 4. Start Running. For example, a task stored in dataflow/tasks/prefill/ may look like this:

+
{
+    // The app you want to use
+    "app": "word",
+    // A unique ID to distinguish different tasks 
+    "unique_id": "1",
+    // The task and steps to be instantiated
+    "task": "Type 'hello' and set the font type to Arial",
+    "refined_steps": [
+        "Type 'hello'",
+        "Set the font to Arial"
+    ]
+}
+
+

3.2. Templates and Descriptions

+

You should place an app file as a reference for instantiation in a folder named after the app.

+

For example, if you have template1.docx for Word, it should be located at dataflow/templates/word/template1.docx.

+

Additionally, for each app folder, there should be a description.json file located at dataflow/templates/word/description.json, which describes each template file in detail. It may look like this:

+
{
+    "template1.docx": "A document with a rectangle shape",
+    "template2.docx": "A document with a line of text"
+}
+
+

If a description.json file is not present, one template file will be selected at random.

+

3.3. Final Structure

+

Ensure the following files are in place:

+
    +
  • JSON files to be instantiated
  • +
  • Templates as references for instantiation
  • +
  • Description file in JSON format
  • +
+

The structure of the files can be:

+
dataflow/
+|
+├── tasks
+│   └── prefill
+│       ├── bulleted.json
+│       ├── delete.json
+│       ├── draw.json
+│       ├── macro.json
+│       └── rotate.json
+├── templates
+│   └── word
+│       ├── description.json
+│       ├── template1.docx
+│       ├── template2.docx
+│       ├── template3.docx
+│       ├── template4.docx
+│       ├── template5.docx
+│       ├── template6.docx
+│       └── template7.docx
+└── ...
+
+

4. Start Running

+

After finishing the previous steps, you can use the following commands in the command line. We provide single / batch process, for which you need to give the single file path / folder path. Determine the type of path provided by the user and automatically decide whether to process a single task or batch tasks.

+

Also, you can choose to use instantiation / execution sections individually, or use them as a whole section, which is named as dataflow.

+

The default task hub is set to be "TASKS_HUB" in dataflow/config_dev.yaml.

+
    +
  • Dataflow Task:
  • +
+
python -m dataflow -dataflow --task_path path_to_task_file
+
+
    +
  • Instantiation Task:
  • +
+
python -m dataflow -instantiation --task_path path_to_task_file
+
+
    +
  • Execution Task:
  • +
+
python -m dataflow -execution --task_path path_to_task_file
+
+

Workflow

+

Instantiation

+

There are three key steps in the instantiation process:

+
    +
  1. Choose a template file according to the specified app and instruction.
  2. +
  3. Prefill the task using the current screenshot.
  4. +
  5. Filter the established task.
  6. +
+

Given the initial task, the dataflow first choose a template (Phase 1), the prefill the initial task based on word envrionment to obtain task-action data (Phase 2). Finnally, it will filter the established task to evaluate the quality of task-action data.

+

+ +

+

1. Choose Template File

+

Templates for your app must be defined and described in dataflow/templates/app. For instance, if you want to instantiate tasks for the Word application, place the relevant .docx files in dataflow /templates/word, along with a description.json file.

+

The appropriate template will be selected based on how well its description matches the instruction.

+

2. Prefill the Task

+

After selecting the template file, it will be opened, and a screenshot will be taken. If the template file is currently in use, errors may occur.

+

The screenshot will be sent to the action prefill agent, which will return a modified task.

+

3. Filter Task

+

The completed task will be evaluated by a filter agent, which will assess it and provide feedback.

+

The more detailed code design documentation for instantiation can be found in instantiation.

+

Execution

+

The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process.

+

In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step.

+

+ +

+ +

The more detailed code design documentation for execution can be found in execution.

+

Result

+

The structure of the results of the task is as below:

+
UFO/
+├── dataflow/                       # Root folder for dataflow
+│   └── results/                    # Directory for storing task processing results
+│       ├── saved_document/         # Directory for final document results
+│       ├── instantiation/          # Directory for instantiation results
+│       │   ├── instantiation_pass/ # Tasks successfully instantiated
+│       │   └── instantiation_fail/ # Tasks that failed instantiation
+│       ├── execution/              # Directory for execution results
+│       │   ├── execution_pass/     # Tasks successfully executed
+│       │   ├── execution_fail/     # Tasks that failed execution
+│       │   └── execution_unsure/   # Tasks with uncertain execution results
+│       ├── dataflow/               # Directory for dataflow results
+│       │   ├── execution_pass/     # Tasks successfully executed
+│       │   ├── execution_fail/     # Tasks that failed execution
+│       │   └── execution_unsure/   # Tasks with uncertain execution results
+│       └── ...
+└── ...
+
+
    +
  1. General Description:
  2. +
+

This directory structure organizes the results of task processing into specific categories, including instantiation, execution, and dataflow outcomes. +2. Instantiation:

+

The instantiation directory contains subfolders for tasks that were successfully instantiated (instantiation_pass) and those that failed during instantiation (instantiation_fail). +3. Execution:

+

Results of task execution are stored under the execution directory, categorized into successful tasks (execution_pass), failed tasks (execution_fail), and tasks with uncertain outcomes (execution_unsure). +4. Dataflow Results:

+

The dataflow directory similarly holds results of tasks based on execution success, failure, or uncertainty, providing a comprehensive view of the data processing pipeline. +5. Saved Documents:

+

Instantiated results are separately stored in the saved_document directory for easy access and reference.

+

Description

+

This section illustrates the structure of the result of the task, organized in a hierarchical format to describe the various fields and their purposes. The result data include unique_idapp, original, execution_result, instantiation_result, time_cost.

+

1. Field Descriptions

+
    +
  • Hierarchy: The data is presented in a hierarchical manner to allow for a clearer understanding of field relationships.
  • +
  • Type Description: The type of each field (e.g., string, array, object) clearly specifies the format of the data.
  • +
  • Field Purpose: Each field has a brief description outlining its function.
  • +
+

2. Execution Results and Errors

+
    +
  • execution_result: Contains the results of task execution, including subtask performance, completion status, and any encountered errors.
  • +
  • instantiation_result: Describes the process of task instantiation, including template selection, prefilled tasks, and instantiation evaluation.
  • +
  • error: If an error occurs during task execution, this field will contain the relevant error information.
  • +
+

3. Time Consumption

+
    +
  • time_cost: The time spent on each phase of the task, from template selection to task execution, is recorded to analyze task efficiency.
  • +
+

Example Data

+
{
+    "unique_id": "102",
+    "app": "word",
+    "original": {
+        "original_task": "Find which Compatibility Mode you are in for Word",
+        "original_steps": [
+            "1.Click the **File** tab.",
+            "2.Click **Info**.",
+            "3.Check the **Compatibility Mode** indicator at the bottom of the document preview pane."
+        ]
+    },
+    "execution_result": {
+        "result": {
+            "reason": "The agent successfully identified the compatibility mode of the Word document.",
+            "sub_scores": {
+                "correct identification of compatibility mode": "yes"
+            },
+            "complete": "yes"
+        },
+        "error": null
+    },
+    "instantiation_result": {
+        "choose_template": {
+            "result": "dataflow\\results\\saved_document\\102.docx",
+            "error": null
+        },
+        "prefill": {
+            "result": {
+                "instantiated_request": "Identify the Compatibility Mode of the Word document.",
+                "instantiated_plan": [
+                    {
+                        "Step": 1,
+                        "Subtask": "Identify the Compatibility Mode",
+                        "Function": "summary",
+                        "Args": {
+                            "text": "The document is in '102 - Compatibility Mode'."
+                        },
+                        "Success": true
+                    }
+                ]
+            },
+            "error": null
+        },
+        "instantiation_evaluation": {
+            "result": {
+                "judge": true,
+                "thought": "Identifying the Compatibility Mode of a Word document is a task that can be executed locally within Word."
+            },
+            "error": null
+        }
+    },
+    "time_cost": {
+        "choose_template": 0.017,
+        "prefill": 11.304,
+        "instantiation_evaluation": 2.38,
+        "total": 34.584,
+        "execute": 0.946,
+        "execute_eval": 10.381
+    }
+}
+
+

Quick Start

+

We prepare two cases to show the dataflow, which can be found in dataflow\tasks\prefill. So after installing required packages, you can type the following command in the command line:

+
python -m dataflow -dataflow
+
+

And you can see the hints showing in the terminal, which means the dataflow is working.

+ +

After the two tasks are finished, the task and output files would appear as follows:

+
UFO/
+├── dataflow/
+│   └── results/
+│       ├── saved_document/         # Directory for saved documents
+│       │   ├── bulleted.docx       # Result of the "bulleted" task
+│       │   └── rotate.docx         # Result of the "rotate" task
+│       ├── dataflow/                    # Dataflow results directory
+│       │   ├── execution_pass/     # Successfully executed tasks
+│       │   │   ├── bulleted.json   # Execution result for the "bulleted" task
+│       │   │   ├── rotate.json      # Execution result for the "rotate" task
+│       │   │   └── ...
+└── ...
+
+

Result files

+

The result stucture of bulleted task is shown as below. This document provides a detailed breakdown of the task execution process for turning lines of text into a bulleted list in Word. It includes the original task description, execution results, and time analysis for each step.

+
    +
  • unique_id : The identifier for the task, in this case, "5".
  • +
  • app : The application being used, which is "word".
  • +
  • +

    original : Contains the original task description and the steps.

    +
  • +
  • +

    original_task : Describes the task in simple terms (turning text into a bulleted list).

    +
  • +
  • original_steps : Lists the steps required to perform the task.
  • +
  • +

    execution_result : Provides the result of executing the task.

    +
  • +
  • +

    result : Describes the outcome of the execution, including a success message and sub-scores for each part of the task. The complete: "yes" means the evaluation agent think the execution process is successful! The sub_score is the evaluation of each subtask, corresponding to the instantiated_plan in the prefill.

    +
  • +
  • error : If any error occurred during execution, it would be reported here, but it's null in this case.
  • +
  • +

    instantiation_result : Details the instantiation of the task (setting up the task for execution).

    +
  • +
  • +

    choose_template : Path to the template or document created during the task (in this case, the bulleted list document).

    +
  • +
  • prefill : Describes the instantiated_request and instantiated_plan and the steps involved, such as selecting text and clicking buttons, which is the result of prefill flow. The Success and MatchedControlText is added in the execution process. Success indicates whether the subtask was executed successfully. MatchedControlText refers to the control text that was matched during the execution process based on the plan.
  • +
  • instantiation_evaluation : Provides feedback on the task's feasibility and the evaluation of the request, which is result of the filter flow. "judge": true : This indicates that the evaluation of the task was positive, meaning the task is considered valid or successfully judged. And the thought is the detailed reason.
  • +
  • time_cost : The time spent on different parts of the task, including template selection, prefill, instantiation evaluation, and execution. Total time is also given.
  • +
+

This structure follows your description and provides the necessary details in a consistent format.

+
{
+    "unique_id": "5",
+    "app": "word",
+    "original": {
+        "original_task": "Turning lines of text into a bulleted list in Word",
+        "original_steps": [
+            "1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list",
+            "2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style"
+        ]
+    },
+    "execution_result": {
+        "result": {
+            "reason": "The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.",
+            "sub_scores": {
+                "text selection": "yes",
+                "bulleted list conversion": "yes"
+            },
+            "complete": "yes"
+        },
+        "error": null
+    },
+    "instantiation_result": {
+        "choose_template": {
+            "result": "dataflow\\results\\saved_document\\bulleted.docx",
+            "error": null
+        },
+        "prefill": {
+            "result": {
+                "instantiated_request": "Turn the line of text 'text to edit' into a bulleted list in Word.",
+                "instantiated_plan": [
+                    {
+                        "Step": 1,
+                        "Subtask": "Place the cursor at the beginning of the text 'text to edit'",
+                        "ControlLabel": null,
+                        "ControlText": "",
+                        "Function": "select_text",
+                        "Args": {
+                            "text": "text to edit"
+                        },
+                        "Success": true,
+                        "MatchedControlText": null
+                    },
+                    {
+                        "Step": 2,
+                        "Subtask": "Click the Bullets button in the Paragraph group on the Home tab",
+                        "ControlLabel": "61",
+                        "ControlText": "Bullets",
+                        "Function": "click_input",
+                        "Args": {
+                            "button": "left",
+                            "double": false
+                        },
+                        "Success": true,
+                        "MatchedControlText": "Bullets"
+                    }
+                ]
+            },
+            "error": null
+        },
+        "instantiation_evaluation": {
+            "result": {
+                "judge": true,
+                "thought": "The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.",
+                "request_type": "None"
+            },
+            "error": null
+        }
+    },
+    "time_cost": {
+        "choose_template": 0.012,
+        "prefill": 15.649,
+        "instantiation_evaluation": 2.469,
+        "execute": 5.824,
+        "execute_eval": 8.702,
+        "total": 43.522
+    }
+}
+
+

Log files

+

The corresponding logs can be found in the directories logs/bulleted and logs/rotate, as shown below. Detailed logs for each workflow are recorded, capturing every step of the execution process.

+

+ +

+ +

Reference

+

AppEnum

+ + +
+ + + + +
+

+ Bases: Enum

+ + +

Enum class for applications.

+ +

Initialize the application enum.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + id + (int) + – +
    +

    The ID of the application.

    +
    +
  • +
  • + description + (str) + – +
    +

    The description of the application.

    +
    +
  • +
  • + file_extension + (str) + – +
    +

    The file extension of the application.

    +
    +
  • +
  • + win_app + (str) + – +
    +

    The Windows application name.

    +
    +
  • +
+
+ + + + + +
+ Source code in dataflow/data_flow_controller.py +
47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
def __init__(self, id: int, description: str, file_extension: str, win_app: str):
+    """
+    Initialize the application enum.
+    :param id: The ID of the application.
+    :param description: The description of the application.
+    :param file_extension: The file extension of the application.
+    :param win_app: The Windows application name.
+    """
+
+    self.id = id
+    self.description = description
+    self.file_extension = file_extension
+    self.win_app = win_app
+    self.app_root_name = win_app.upper() + ".EXE"
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +

TaskObject

+ + +
+ + + + +
+ + +

Initialize the task object.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + task_file_path + (str) + – +
    +

    The path to the task file.

    +
    +
  • +
  • + task_type + (str) + – +
    +

    The task_type of the task object (dataflow, instantiation, or execution).

    +
    +
  • +
+
+ + + + + +
+ Source code in dataflow/data_flow_controller.py +
64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
def __init__(self, task_file_path: str, task_type: str) -> None:
+    """
+    Initialize the task object.
+    :param task_file_path: The path to the task file.
+    :param task_type: The task_type of the task object (dataflow, instantiation, or execution).
+    """
+
+    self.task_file_path = task_file_path
+    self.task_file_base_name = os.path.basename(task_file_path)
+    self.task_file_name = self.task_file_base_name.split(".")[0]
+
+    task_json_file = load_json_file(task_file_path)
+    self.app_object = self._choose_app_from_json(task_json_file["app"])
+    # Initialize the task attributes based on the task_type
+    self._init_attr(task_type, task_json_file)
+
+
+ + + +
+ + + + + + + + + + + +
+ +
+ +

DataFlowController

+ + +
+ + + + +
+ + +

Flow controller class to manage the instantiation and execution process.

+ +

Initialize the flow controller.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + task_path + (str) + – +
    +

    The path to the task file.

    +
    +
  • +
  • + task_type + (str) + – +
    +

    The task_type of the flow controller (instantiation, execution, or dataflow).

    +
    +
  • +
+
+ + + + + +
+ Source code in dataflow/data_flow_controller.py +
116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
def __init__(self, task_path: str, task_type: str) -> None:
+    """
+    Initialize the flow controller.
+    :param task_path: The path to the task file.
+    :param task_type: The task_type of the flow controller (instantiation, execution, or dataflow).
+    """
+
+    self.task_object = TaskObject(task_path, task_type)
+    self.app_env = None
+    self.app_name = self.task_object.app_object.description.lower()
+    self.task_file_name = self.task_object.task_file_name
+
+    self.schema = self._load_schema(task_type)
+
+    self.task_type = task_type
+    self.task_info = self.init_task_info()
+    self.result_hub = _configs["RESULT_HUB"].format(task_type=task_type)
+
+
+ + + +
+ + + + + + + +
+ + + +

+ instantiated_plan: List[Dict[str, Any]] + + + property + writable + + +

+ + +
+ +

Get the instantiated plan from the task information.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + List[Dict[str, Any]] + – +
    +

    The instantiated plan.

    +
    +
  • +
+
+ +
+ +
+ + + +

+ template_copied_path: str + + + property + + +

+ + +
+ +

Get the copied template path from the task information.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + str + – +
    +

    The copied template path.

    +
    +
  • +
+
+ +
+ + + +
+ + +

+ execute_execution(request, plan) + +

+ + +
+ +

Execute the execution process.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    The task request to be executed.

    +
    +
  • +
  • + plan + (Dict[str, any]) + – +
    +

    The execution plan containing detailed steps.

    +
    +
  • +
+
+
+ Source code in dataflow/data_flow_controller.py +
205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+248
+249
def execute_execution(self, request: str, plan: Dict[str, any]) -> None:
+    """
+    Execute the execution process.
+    :param request: The task request to be executed.
+    :param plan: The execution plan containing detailed steps.
+    """
+
+    print_with_color("Executing the execution process...", "blue")
+    execute_flow = None
+
+    try:
+        self.app_env.start(self.template_copied_path)
+        # Initialize the execution context and flow
+        context = Context()
+        execute_flow = ExecuteFlow(self.task_file_name, context, self.app_env)
+
+        # Execute the plan
+        executed_plan, execute_result = execute_flow.execute(request, plan)
+
+        # Update the instantiated plan
+        self.instantiated_plan = executed_plan
+        # Record execution results and time metrics
+        self.task_info["execution_result"]["result"] = execute_result
+        self.task_info["time_cost"]["execute"] = execute_flow.execution_time
+        self.task_info["time_cost"]["execute_eval"] = execute_flow.eval_time
+
+    except Exception as e:
+        # Handle and log any exceptions that occur during execution
+        self.task_info["execution_result"]["error"] = {
+            "type": str(type(e).__name__),
+            "message": str(e),
+            "traceback": traceback.format_exc(),
+        }
+        print_with_color(f"Error in Execution: {e}", "red")
+        raise e
+    finally:
+        # Record the total time cost of the execution process
+        if execute_flow and hasattr(execute_flow, "execution_time"):
+            self.task_info["time_cost"]["execute"] = execute_flow.execution_time
+        else:
+            self.task_info["time_cost"]["execute"] = None
+        if execute_flow and hasattr(execute_flow, "eval_time"):
+            self.task_info["time_cost"]["execute_eval"] = execute_flow.eval_time
+        else:
+            self.task_info["time_cost"]["execute_eval"] = None
+
+
+
+ +
+ +
+ + +

+ execute_instantiation() + +

+ + +
+ +

Execute the instantiation process.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Optional[List[Dict[str, Any]]] + – +
    +

    The instantiation plan if successful.

    +
    +
  • +
+
+
+ Source code in dataflow/data_flow_controller.py +
173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
def execute_instantiation(self) -> Optional[List[Dict[str, Any]]]:
+    """
+    Execute the instantiation process.
+    :return: The instantiation plan if successful.
+    """
+
+    print_with_color(f"Instantiating task {self.task_object.task_file_name}...", "blue")
+
+    template_copied_path = self.instantiation_single_flow(
+        ChooseTemplateFlow, "choose_template", 
+        init_params=[self.task_object.app_object.file_extension],
+        execute_params=[]
+    )
+
+    if template_copied_path:
+        self.app_env.start(template_copied_path)
+
+        prefill_result = self.instantiation_single_flow(
+            PrefillFlow, "prefill", 
+            init_params=[self.app_env],
+            execute_params=[template_copied_path, self.task_object.task, self.task_object.refined_steps]
+        )
+        self.app_env.close()
+
+        if prefill_result:
+            self.instantiation_single_flow(
+                FilterFlow, "instantiation_evaluation",
+                init_params=[],
+                execute_params=[prefill_result["instantiated_request"]]
+            )
+            return prefill_result["instantiated_plan"]
+
+
+
+ +
+ +
+ + +

+ init_task_info() + +

+ + +
+ +

Initialize the task information.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, Any] + – +
    +

    The initialized task information.

    +
    +
  • +
+
+
+ Source code in dataflow/data_flow_controller.py +
134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
def init_task_info(self) -> Dict[str, Any]: 
+    """
+    Initialize the task information.
+    :return: The initialized task information.
+    """
+    init_task_info = None
+    if self.task_type == "execution":
+        # read from the instantiated task file
+        init_task_info = load_json_file(self.task_object.task_file_path)
+    else:
+        init_task_info = {
+            "unique_id": self.task_object.unique_id,
+            "app": self.app_name,
+            "original": {
+                "original_task": self.task_object.task,
+                "original_steps": self.task_object.refined_steps,
+            },
+            "execution_result": {"result": None, "error": None},
+            "instantiation_result": {
+                "choose_template": {"result": None, "error": None},
+                "prefill": {"result": None, "error": None},
+                "instantiation_evaluation": {"result": None, "error": None},
+            },
+            "time_cost": {},
+        }
+    return init_task_info
+
+
+
+ +
+ +
+ + +

+ instantiation_single_flow(flow_class, flow_type, init_params=None, execute_params=None) + +

+ + +
+ +

Execute a single flow process in the instantiation phase.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + flow_class + (AppAgentProcessor) + – +
    +

    The flow class to instantiate.

    +
    +
  • +
  • + flow_type + (str) + – +
    +

    The type of the flow.

    +
    +
  • +
  • + init_params + – +
    +

    The initialization parameters for the flow.

    +
    +
  • +
  • + execute_params + – +
    +

    The execution parameters for the flow.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Optional[Dict[str, Any]] + – +
    +

    The result of the flow process.

    +
    +
  • +
+
+
+ Source code in dataflow/data_flow_controller.py +
252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
def instantiation_single_flow(
+        self, 
+        flow_class: AppAgentProcessor, 
+        flow_type: str, 
+        init_params=None, 
+        execute_params=None
+    ) -> Optional[Dict[str, Any]]:
+    """
+    Execute a single flow process in the instantiation phase.
+    :param flow_class: The flow class to instantiate.
+    :param flow_type: The type of the flow.
+    :param init_params: The initialization parameters for the flow.
+    :param execute_params: The execution parameters for the flow.
+    :return: The result of the flow process.
+    """
+
+    flow_instance = None
+    try:
+        flow_instance = flow_class(self.app_name, self.task_file_name, *init_params)
+        result = flow_instance.execute(*execute_params)
+        self.task_info["instantiation_result"][flow_type]["result"] = result
+        return result
+    except Exception as e:
+        self.task_info["instantiation_result"][flow_type]["error"] = {
+            "type": str(e.__class__),
+            "error_message": str(e),
+            "traceback": traceback.format_exc(),
+        }
+        print_with_color(f"Error in {flow_type}: {e} {traceback.format_exc()}")
+    finally:
+        if flow_instance and hasattr(flow_instance, "execution_time"):
+            self.task_info["time_cost"][flow_type] = flow_instance.execution_time
+        else:
+            self.task_info["time_cost"][flow_type] = None
+
+
+
+ +
+ +
+ + +

+ run() + +

+ + +
+ +

Run the instantiation and execution process.

+ +
+ Source code in dataflow/data_flow_controller.py +
360
+361
+362
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
+378
+379
+380
+381
+382
+383
+384
+385
+386
+387
+388
+389
def run(self) -> None:
+    """
+    Run the instantiation and execution process.
+    """
+
+    start_time = time.time()
+
+    try:
+        self.app_env = WindowsAppEnv(self.task_object.app_object)
+
+        if self.task_type == "dataflow":
+            plan = self.execute_instantiation()
+            self.execute_execution(self.task_object.task, plan)
+        elif self.task_type == "instantiation":
+            self.execute_instantiation()
+        elif self.task_type == "execution":
+            plan = self.instantiated_plan
+            self.execute_execution(self.task_object.task, plan)
+        else:
+            raise ValueError(f"Unsupported task_type: {self.task_type}")
+    except Exception as e:
+        raise e
+
+    finally:
+        # Update or record the total time cost of the process
+        total_time = round(time.time() - start_time, 3)
+        new_total_time = self.task_info.get("time_cost", {}).get("total", 0) + total_time
+        self.task_info["time_cost"]["total"] = round(new_total_time, 3)
+
+        self.save_result()
+
+
+
+ +
+ +
+ + +

+ save_result() + +

+ + +
+ +

Validate and save the instantiated task result.

+ +
+ Source code in dataflow/data_flow_controller.py +
287
+288
+289
+290
+291
+292
+293
+294
+295
+296
+297
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+317
+318
+319
+320
+321
+322
+323
+324
+325
+326
+327
+328
+329
+330
def save_result(self) -> None:
+    """
+    Validate and save the instantiated task result.
+    """
+
+    validation_error = None
+
+    # Validate the result against the schema
+    try:
+        validate(instance=self.task_info, schema=self.schema)
+    except ValidationError as e:
+        # Record the validation error but allow the process to continue
+        validation_error = str(e.message)
+        print_with_color(f"Validation Error: {e.message}", "yellow")
+
+    # Determine the target directory based on task_type and quality/completeness
+    target_file = None
+
+    if self.task_type == "instantiation":
+        # Determine the quality of the instantiation
+        if not self.task_info["instantiation_result"]["instantiation_evaluation"]["result"]:
+            target_file = INSTANTIATION_RESULT_MAP[False]
+        else:
+            is_quality_good = self.task_info["instantiation_result"]["instantiation_evaluation"]["result"]["judge"]
+            target_file = INSTANTIATION_RESULT_MAP.get(is_quality_good, INSTANTIATION_RESULT_MAP[False])
+
+    else:
+        # Determine the completion status of the execution
+        if not self.task_info["execution_result"]["result"]:
+            target_file = EXECUTION_RESULT_MAP["no"]
+        else:
+            is_completed = self.task_info["execution_result"]["result"]["complete"]
+            target_file = EXECUTION_RESULT_MAP.get(is_completed, EXECUTION_RESULT_MAP["no"])
+
+    # Construct the full path to save the result
+    new_task_path = os.path.join(self.result_hub, target_file, self.task_object.task_file_base_name)
+    os.makedirs(os.path.dirname(new_task_path), exist_ok=True)
+    save_json_file(new_task_path, self.task_info)
+
+    print(f"Task saved to {new_task_path}")
+
+    # If validation failed, indicate that the saved result may need further inspection
+    if validation_error:
+        print("The saved task result does not conform to the expected schema and may require review.")
+
+
+
+ +
+ + + +
+ +
+ +
+

Note

+
    +
  1. Users should be careful to save the original files while using this project; otherwise, the files will be closed when the app is shut down.
  2. +
  3. After starting the project, users should not close the app window while the program is taking screenshots.
  4. +
+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/dataflow/result_schema/index.html b/dataflow/result_schema/index.html new file mode 100644 index 00000000..f8344095 --- /dev/null +++ b/dataflow/result_schema/index.html @@ -0,0 +1,744 @@ + + + + + + + + Result Schema - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Result schema

+

Instantiation Result Schema

+

This schema defines the structure of a JSON object that might be used to represent the results of task instantiation.

+

Root Structure

+
    +
  • The schema is an object with the following key fields:
  • +
  • unique_id: A string serving as the unique identifier for the task.
  • +
  • app: A string representing the application where the task is being executed.
  • +
  • original: An object containing details about the original task.
  • +
+

Field Descriptions

+
    +
  1. unique_id
  2. +
  3. Type: string
  4. +
  5. +

    Purpose: Provides a globally unique identifier for the task.

    +
  6. +
  7. +

    app

    +
  8. +
  9. Type: string
  10. +
  11. +

    Purpose: Specifies the application associated with the task execution.

    +
  12. +
  13. +

    original

    +
  14. +
  15. Type: object
  16. +
  17. +

    Contains the following fields:

    +
      +
    • original_task:
    • +
    • Type: string
    • +
    • Purpose: Describes the main task in textual form.
    • +
    • original_steps:
    • +
    • Type: array of string
    • +
    • Purpose: Lists the sequential steps required for the task.
    • +
    +
  18. +
  19. +

    Required fields: original_task, original_steps

    +
  20. +
  21. +

    execution_result

    +
  22. +
  23. Type: object or null
  24. +
  25. Contains fields describing the results of task execution:
      +
    • result: Always null, indicating no execution results are included.
    • +
    • error: Always null, implying execution errors are not tracked in this schema.
    • +
    +
  26. +
  27. +

    Purpose: Simplifies the structure by omitting detailed execution results.

    +
  28. +
  29. +

    instantiation_result

    +
  30. +
  31. Type: object
  32. +
  33. +

    Contains fields detailing the results of task instantiation:

    +
      +
    • choose_template:
    • +
    • Type: object
    • +
    • Fields:
        +
      • result: A string or null, representing the outcome of template selection.
      • +
      • error: A string or null, detailing any errors during template selection.
      • +
      +
    • +
    • Required fields: result, error
    • +
    • prefill:
    • +
    • Type: object or null
    • +
    • Contains results of pre-filling instantiation:
        +
      • result:
      • +
      • Type: object or null
      • +
      • Fields:
          +
        • instantiated_request: A string, representing the generated request.
        • +
        • instantiated_plan: An array or null, listing instantiation steps:
        • +
        • Step: An integer representing the sequence of the step.
        • +
        • Subtask: A string describing the subtask.
        • +
        • ControlLabel: A string or null, representing the control label.
        • +
        • ControlText: A string, providing context for the step.
        • +
        • Function: A string, specifying the function executed at this step.
        • +
        • Args: An object, containing any arguments required by the function.
        • +
        • Required fields: Step, Subtask, Function, Args
        • +
        +
      • +
      • Required fields: instantiated_request, instantiated_plan
      • +
      • error: A string or null, describing errors encountered during prefill.
      • +
      +
    • +
    • Required fields: result, error
    • +
    • instantiation_evaluation:
    • +
    • Type: object
    • +
    • Fields:
        +
      • result:
      • +
      • Type: object or null
      • +
      • Contains:
          +
        • judge: A boolean, indicating whether the instantiation is valid.
        • +
        • thought: A string, providing reasoning or observations.
        • +
        • request_type: A string, classifying the request type.
        • +
        +
      • +
      • Required fields: judge, thought, request_type
      • +
      • error: A string or null, indicating errors during evaluation.
      • +
      +
    • +
    • Required fields: result, error
    • +
    +
  34. +
  35. +

    time_cost

    +
  36. +
  37. Type: object
  38. +
  39. Tracks time metrics for various stages of task instantiation:
      +
    • choose_template: A number or null, time spent selecting a template.
    • +
    • prefill: A number or null, time used for pre-filling.
    • +
    • instantiation_evaluation: A number or null, time spent on evaluation.
    • +
    • total: A number or null, total time cost for all processes.
    • +
    +
  40. +
  41. Required fields: choose_template, prefill, instantiation_evaluation, total
  42. +
+
+

Example Data

+
{
+    "unique_id": "5",
+    "app": "word",
+    "original": {
+        "original_task": "Turning lines of text into a bulleted list in Word",
+        "original_steps": [
+            "1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list",
+            "2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style"
+        ]
+    },
+    "execution_result": {
+        "result": null,
+        "error": null
+    },
+    "instantiation_result": {
+        "choose_template": {
+            "result": "dataflow\\results\\saved_document\\bulleted.docx",
+            "error": null
+        },
+        "prefill": {
+            "result": {
+                "instantiated_request": "Turn the line of text 'text to edit' into a bulleted list in Word.",
+                "instantiated_plan": [
+                    {
+                        "Step": 1,
+                        "Subtask": "Place the cursor at the beginning of the text 'text to edit'",
+                        "ControlLabel": null,
+                        "ControlText": "",
+                        "Function": "select_text",
+                        "Args": {
+                            "text": "text to edit"
+                        }
+                    },
+                    {
+                        "Step": 2,
+                        "Subtask": "Click the Bullets button in the Paragraph group on the Home tab",
+                        "ControlLabel": null,
+                        "ControlText": "Bullets",
+                        "Function": "click_input",
+                        "Args": {
+                            "button": "left",
+                            "double": false
+                        }
+                    }
+                ]
+            },
+            "error": null
+        },
+        "instantiation_evaluation": {
+            "result": {
+                "judge": true,
+                "thought": "The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.",
+                "request_type": "None"
+            },
+            "error": null
+        }
+    },
+    "time_cost": {
+        "choose_template": 0.012,
+        "prefill": 15.649,
+        "instantiation_evaluation": 2.469,
+        "execute": null,
+        "execute_eval": null,
+        "total": 18.130
+    }
+}
+
+

Execution Result Schema

+

This schema defines the structure of a JSON object that might be used to represent the results of task execution or dataflow. Below are the main fields and their detailed descriptions.

+

Unlike the instantiation result, the execution result schema provides detailed feedback on execution, including success metrics (reason, sub_scores). Additionally, based on the original instantiated_plan, each step has been enhanced with the fields Success and MatchedControlText, which represent whether the step executed successfully (success is indicated by no errors) and the name of the last matched control, respectively. The ControlLabel will also be updated to reflect the final selected ControlLabel.

+
+

Top-Level Fields

+
    +
  1. +

    unique_id

    +
  2. +
  3. +

    Type: string

    +
  4. +
  5. Description: A unique identifier for the task or record.
  6. +
  7. +

    app

    +
  8. +
  9. +

    Type: string

    +
  10. +
  11. Description: The name of the application associated with the task.
  12. +
  13. +

    original

    +
  14. +
  15. +

    Type: object

    +
  16. +
  17. Description: Contains the original definition of the task.
  18. +
  19. Properties:
      +
    • original_task:
    • +
    • Type: string
    • +
    • Description: The original description of the task.
    • +
    • original_steps:
    • +
    • Type: array
    • +
    • Description: An array of strings representing the steps of the task.
    • +
    +
  20. +
  21. +

    execution_result

    +
  22. +
  23. +

    Type: object or null

    +
  24. +
  25. Description: Represents the results of the task execution.
  26. +
  27. Properties:
      +
    • result:
    • +
    • Type: object or null
    • +
    • Description: Contains the details of the execution result.
    • +
    • Sub-properties:
        +
      • reason: The reason for the execution result, type string.
      • +
      • sub_scores: A set of sub-scores, represented as key-value pairs (.* allows any key pattern).
      • +
      • complete: Indicates the completion status, type string.
      • +
      +
    • +
    • error:
    • +
    • Type: object or null
    • +
    • Description: Represents any error information encountered during execution.
    • +
    • Sub-properties:
        +
      • type: The type of error, type string.
      • +
      • message: The error message, type string.
      • +
      • traceback: The error traceback, type string.
      • +
      +
    • +
    +
  28. +
  29. +

    instantiation_result

    +
  30. +
  31. +

    Type: object

    +
  32. +
  33. Description: Contains results related to task instantiation.
  34. +
  35. Properties:
      +
    • choose_template:
    • +
    • Type: object
    • +
    • Description: Results of template selection.
    • +
    • Sub-properties:
        +
      • result: The result of template selection, type string or null.
      • +
      • error: Error information, type null or string.
      • +
      +
    • +
    • prefill:
    • +
    • Type: object or null
    • +
    • Description: Results of the prefill phase.
    • +
    • Sub-properties:
        +
      • result:
      • +
      • Type: object or null
      • +
      • Description: Contains the instantiated request and plan.
      • +
      • Sub-properties:
          +
        • instantiated_request: The instantiated task request, type string.
        • +
        • instantiated_plan: The instantiated task plan, type array or null.
        • +
        • Each item in the array is an object with:
            +
          • Step: Step number, type integer.
          • +
          • Subtask: Description of the subtask, type string.
          • +
          • ControlLabel: Control label, type string or null.
          • +
          • ControlText: Control text, type string.
          • +
          • Function: Function name, type string.
          • +
          • Args: Arguments to the function, type object.
          • +
          • Success: Whether the step succeeded, type boolean or null.
          • +
          • MatchedControlText: Matched control text, type string or null.
          • +
          +
        • +
        +
      • +
      • error: Prefill error information, type null or string.
      • +
      +
    • +
    • instantiation_evaluation:
    • +
    • Type: object
    • +
    • Description: Results of task instantiation evaluation.
    • +
    • Sub-properties:
        +
      • result:
      • +
      • Type: object or null
      • +
      • Description: Contains evaluation information.
      • +
      • Sub-properties:
          +
        • judge: Whether the evaluation succeeded, type boolean.
        • +
        • thought: Evaluator's thoughts, type string.
        • +
        • request_type: The type of request, type string.
        • +
        +
      • +
      • error: Evaluation error information, type null or string.
      • +
      +
    • +
    +
  36. +
  37. +

    time_cost

    +
  38. +
  39. +

    Type: object

    +
  40. +
  41. Description: Represents the time costs for various phases.
  42. +
  43. Properties:
      +
    • choose_template: Time spent selecting the template, type number or null.
    • +
    • prefill: Time spent in the prefill phase, type number or null.
    • +
    • instantiation_evaluation: Time spent in instantiation evaluation, type number or null.
    • +
    • total: Total time cost, type number or null.
    • +
    • execute: Time spent in execution, type number or null.
    • +
    • execute_eval: Time spent in execution evaluation, type number or null.
    • +
    +
  44. +
+
+

Required Fields

+

The fields unique_id, app, original, execution_result, instantiation_result, and time_cost are required for the JSON object to be valid.

+

Example Data

+
{
+    "unique_id": "5",
+    "app": "word",
+    "original": {
+        "original_task": "Turning lines of text into a bulleted list in Word",
+        "original_steps": [
+            "1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list",
+            "2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style"
+        ]
+    },
+    "execution_result": {
+        "result": {
+            "reason": "The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.",
+            "sub_scores": {
+                "text selection": "yes",
+                "bulleted list conversion": "yes"
+            },
+            "complete": "yes"
+        },
+        "error": null
+    },
+    "instantiation_result": {
+        "choose_template": {
+            "result": "dataflow\\results\\saved_document\\bulleted.docx",
+            "error": null
+        },
+        "prefill": {
+            "result": {
+                "instantiated_request": "Turn the line of text 'text to edit' into a bulleted list in Word.",
+                "instantiated_plan": [
+                    {
+                        "Step": 1,
+                        "Subtask": "Place the cursor at the beginning of the text 'text to edit'",
+                        "ControlLabel": null,
+                        "ControlText": "",
+                        "Function": "select_text",
+                        "Args": {
+                            "text": "text to edit"
+                        },
+                        "Success": true,
+                        "MatchedControlText": null
+                    },
+                    {
+                        "Step": 2,
+                        "Subtask": "Click the Bullets button in the Paragraph group on the Home tab",
+                        "ControlLabel": "61",
+                        "ControlText": "Bullets",
+                        "Function": "click_input",
+                        "Args": {
+                            "button": "left",
+                            "double": false
+                        },
+                        "Success": true,
+                        "MatchedControlText": "Bullets"
+                    }
+                ]
+            },
+            "error": null
+        },
+        "instantiation_evaluation": {
+            "result": {
+                "judge": true,
+                "thought": "The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.",
+                "request_type": "None"
+            },
+            "error": null
+        }
+    },
+    "time_cost": {
+        "choose_template": 0.012,
+        "prefill": 15.649,
+        "instantiation_evaluation": 2.469,
+        "execute": 5.824,
+        "execute_eval": 8.702,
+        "total": 43.522
+    }
+}
+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/dataflow/windows_app_env/index.html b/dataflow/windows_app_env/index.html new file mode 100644 index 00000000..24569ebb --- /dev/null +++ b/dataflow/windows_app_env/index.html @@ -0,0 +1,839 @@ + + + + + + + + Windows App Environment - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

WindowsAppEnv

+

WindowsAppEnv class represents the environment for controlling a Windows application. It provides methods for starting, stopping, and interacting with Windows applications, including window matching based on configurable strategies.

+

Matching Strategies

+

In the WindowsAppEnv class, matching strategies are rules that determine how to match window or control names with a given document name or target text. Based on the configuration file, three different matching strategies can be selected: contains, fuzzy, and regex.

+
    +
  • Contains Matching is the simplest strategy, suitable when the window and document names match exactly.
  • +
  • Fuzzy Matching is more flexible and can match even when there are spelling errors or partial matches between the window title and document name.
  • +
  • s Matching offers the most flexibility, ideal for complex matching patterns in window titles.
  • +
+

1. Window Matching Example

+

The method find_matching_window is responsible for matching windows based on the configured matching strategy. Here's how you can use it to find a window by providing a document name:

+

Example:

+
# Initialize your application object (assuming app_object is already defined)
+app_env = WindowsAppEnv(app_object)
+
+# Define the document name you're looking for
+doc_name = "example_document_name"
+
+# Call find_matching_window to find the window that matches the document name
+matching_window = app_env.find_matching_window(doc_name)
+
+if matching_window:
+    print(f"Found matching window: {matching_window.element_info.name}")
+else:
+    print("No matching window found.")
+
+

Explanation:

+
    +
  • app_env.find_matching_window(doc_name) will search through all open windows and match the window title using the strategy defined in the configuration (contains, fuzzy, or regex).
  • +
  • If a match is found, the matching_window object will contain the matched window, and you can print the window's name.
  • +
  • If no match is found, it will return None.
  • +
+

2. Control Matching Example

+

To find a matching control within a window, you can use the find_matching_controller method. This method requires a dictionary of filtered controls and a control text to match against.

+

Example:

+
# Initialize your application object (assuming app_object is already defined)
+app_env = WindowsAppEnv(app_object)
+
+# Define a filtered annotation dictionary of controls (control_key, control_object)
+# Here, we assume you have a dictionary of UIAWrapper controls from a window.
+filtered_annotation_dict = {
+    1: some_control_1,  # Example control objects
+    2: some_control_2,  # Example control objects
+}
+
+# Define the control text you're searching for
+control_text = "submit_button"
+
+# Call find_matching_controller to find the best match
+controller_key, control_selected = app_env.find_matching_controller(filtered_annotation_dict, control_text)
+
+if control_selected:
+    print(f"Found matching control with key {controller_key}: {control_selected.window_text()}")
+else:
+    print("No matching control found.")
+
+

Explanation:

+
    +
  • filtered_annotation_dict is a dictionary where the key represents the control's ID and the value is the control object (UIAWrapper).
  • +
  • control_text is the text you're searching for within those controls.
  • +
  • app_env.find_matching_controller(filtered_annotation_dict, control_text) will calculate the matching score for each control based on the defined strategy and return the control with the highest match score.
  • +
  • If a match is found, it will return the control object (control_selected) and its key (controller_key), which can be used for further interaction.
  • +
+

Reference

+ + +
+ + + + +
+ + +

Represents the Windows Application Environment.

+ +

Initializes the Windows Application Environment.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + app_object + (object) + – +
    +

    The app object containing information about the application.

    +
    +
  • +
+
+ + + + + +
+ Source code in env/env_manager.py +
29
+30
+31
+32
+33
+34
+35
+36
+37
+38
def __init__(self, app_object: object) -> None:
+    """
+    Initializes the Windows Application Environment.
+    :param app_object: The app object containing information about the application.
+    """
+
+    self.app_window = None
+    self.app_root_name = app_object.app_root_name
+    self.app_name = app_object.description.lower()
+    self.win_app = app_object.win_app
+
+
+ + + +
+ + + + + + + + + +
+ + +

+ close() + +

+ + +
+ +

Tries to gracefully close the application; if it fails or is not closed, forcefully terminates the process.

+ +
+ Source code in env/env_manager.py +
57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
def close(self) -> None:
+    """
+    Tries to gracefully close the application; if it fails or is not closed, forcefully terminates the process.
+    """
+
+    try:
+        # Attempt to close gracefully
+        if self.app_window:
+            self.app_window.close()
+
+        self._check_and_kill_process()
+        sleep(1)
+    except Exception as e:
+        logging.warning(f"Graceful close failed: {e}. Attempting to forcefully terminate the process.")
+        self._check_and_kill_process()
+        raise e
+
+
+
+ +
+ +
+ + +

+ find_matching_controller(filtered_annotation_dict, control_text) + +

+ + +
+ +

" +Select the best matched controller.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + filtered_annotation_dict + (Dict[int, UIAWrapper]) + – +
    +

    The filtered annotation dictionary.

    +
    +
  • +
  • + control_text + (str) + – +
    +

    The text content of the control for additional context.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Tuple[str, UIAWrapper] + – +
    +

    Tuple containing the key of the selected controller and the control object.s

    +
    +
  • +
+
+
+ Source code in env/env_manager.py +
156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
def find_matching_controller(self, filtered_annotation_dict: Dict[int, UIAWrapper], control_text: str) -> Tuple[str, UIAWrapper]:
+    """"
+    Select the best matched controller.
+    :param filtered_annotation_dict: The filtered annotation dictionary.
+    :param control_text: The text content of the control for additional context.
+    :return: Tuple containing the key of the selected controller and the control object.s
+    """
+    control_selected = None
+    controller_key = None
+    highest_score = 0
+
+    # Iterate through the filtered annotation dictionary to find the best match
+    for key, control in filtered_annotation_dict.items():
+        # Calculate the matching score using the match function
+        score = self._calculate_match_score(control, control_text)
+
+        # Update the selected control if the score is higher
+        if score > highest_score:
+            highest_score = score
+            controller_key = key
+            control_selected = control
+
+    return controller_key, control_selected
+
+
+
+ +
+ +
+ + +

+ find_matching_window(doc_name) + +

+ + +
+ +

Finds a matching window based on the process name and the configured matching strategy.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + doc_name + (str) + – +
    +

    The document name associated with the application.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Optional[UIAWrapper] + – +
    +

    The matched window or None if no match is found.

    +
    +
  • +
+
+
+ Source code in env/env_manager.py +
 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
def find_matching_window(self, doc_name: str) -> Optional[UIAWrapper]:
+    """
+    Finds a matching window based on the process name and the configured matching strategy.
+    :param doc_name: The document name associated with the application.
+    :return: The matched window or None if no match is found.
+    """
+
+    desktop = Desktop(backend=_BACKEND)
+    windows_list = desktop.windows()
+    for window in windows_list:
+        window_title = window.element_info.name.lower()
+        if self._match_window_name(window_title, doc_name):
+            self.app_window = window
+            return window
+    return None
+
+
+
+ +
+ +
+ + +

+ start(copied_template_path) + +

+ + +
+ +

Starts the Windows environment.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + copied_template_path + (str) + – +
    +

    The file path to the copied template to start the environment.

    +
    +
  • +
+
+
+ Source code in env/env_manager.py +
40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
def start(self, copied_template_path: str) -> None:
+    """
+    Starts the Windows environment.
+    :param copied_template_path: The file path to the copied template to start the environment.
+    """
+
+    from ufo.automator.ui_control import openfile
+
+    file_controller = openfile.FileController(_BACKEND)
+    try:
+        file_controller.execute_code(
+            {"APP": self.win_app, "file_path": copied_template_path}
+        )
+    except Exception as e:
+        logging.exception(f"Failed to start the application: {e}")
+        raise
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/faq/index.html b/faq/index.html new file mode 100644 index 00000000..08dcb25b --- /dev/null +++ b/faq/index.html @@ -0,0 +1,344 @@ + + + + + + + + FAQ - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + +
  • +
  • +
+
+
+
+
+ +

FAQ

+

We provide answers to some frequently asked questions about the UFO.

+

Q1: Why is it called UFO?

+

A: UFO stands for UI Focused agent. The name is inspired by the concept of an unidentified flying object (UFO) that is mysterious and futuristic.

+

Q2: Can I use UFO on Linux or macOS?

+

A: UFO is currently only supported on Windows OS.

+

Q3: Why the latency of UFO is high?

+

A: The latency of UFO depends on the response time of the LLMs and the network speed. If you are using GPT, it usually takes dozens of seconds to generate a response in one step. The workload of the GPT endpoint may also affect the latency.

+

Q4: What models does UFO support?

+

A: UFO supports various language models, including OpenAI and Azure OpenAI models, QWEN, google Gimini, Ollama, and more. You can find the full list of supported models in the Supported Models section of the documentation.

+

Q5: Can I use non-vision models in UFO?

+

A: Yes, you can use non-vision models in UFO. You can set the VISUAL_MODE to False in the config.yaml file to disable the visual mode and use non-vision models. However, UFO is designed to work with vision models, and using non-vision models may affect the performance.

+

Q6: Can I host my own LLM endpoint?

+

A: Yes, you can host your custom LLM endpoint and configure UFO to use it. Check the documentation in the Supported Models section for more details.

+

Q7: Can I use non-English requests in UFO?

+

A: It depends on the language model you are using. Most of LLMs support multiple languages, and you can specify the language in the request. However, the performance may vary for different languages.

+

Q8: Why it shows the error Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))?

+

A: This means the LLM endpoint is not accessible. You can check the network connection (e.g. VPN) and the status of the LLM endpoint.

+
+

Info

+

To get more support, please submit an issue on the GitHub Issues, or send an email to ufo-agent@microsoft.com.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + +
+ + + + + + + + + diff --git a/getting_started/more_guidance/index.html b/getting_started/more_guidance/index.html new file mode 100644 index 00000000..42c1464e --- /dev/null +++ b/getting_started/more_guidance/index.html @@ -0,0 +1,322 @@ + + + + + + + + More Guidance - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

More Guidance

+

For Users

+

If you are a user of UFO, and want to use it to automate your tasks on Windows, you can refer to User Configuration to set up your environment and start using UFO. +For instance, except for configuring the HOST_AGENT and APP_AGENT, you can also configure the LLM parameters and RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources.

+

For Developers

+

If you are a developer who wants to contribute to UFO, you can take a look at the Developer Configuration to explore the development environment setup and the development workflow.

+

You can also refer to the Project Structure to understand the project structure and the role of each component in UFO, and use the rest of the documentation to understand the architecture and design of UFO. Taking a look at the Session and Round can help you understand the core logic of UFO.

+

For debugging and testing, it is recommended to check the log files in the ufo/logs directory to track the execution of UFO and identify any issues that may arise.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/getting_started/quick_start/index.html b/getting_started/quick_start/index.html new file mode 100644 index 00000000..df3c32f6 --- /dev/null +++ b/getting_started/quick_start/index.html @@ -0,0 +1,429 @@ + + + + + + + + Quick Start - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Quick Start

+

🛠️ Step 1: Installation

+

UFO requires Python >= 3.10 running on Windows OS >= 10. It can be installed by running the following command:

+
# [optional to create conda environment]
+# conda create -n ufo python=3.10
+# conda activate ufo
+
+# clone the repository
+git clone https://github.com/microsoft/UFO.git
+cd UFO
+# install the requirements
+pip install -r requirements.txt
+# If you want to use the Qwen as your LLMs, uncomment the related libs.
+
+

⚙️ Step 2: Configure the LLMs

+

Before running UFO, you need to provide your LLM configurations individually for HostAgent and AppAgent. You can create your own config file ufo/config/config.yaml, by copying the ufo/config/config.yaml.template and editing config for APP_AGENT and ACTION_AGENT as follows:

+

OpenAI

+
VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.  
+API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
+API_KEY: "sk-",  # The OpenAI API key, begin with sk-
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview",  # The OpenAI model
+
+

Azure OpenAI (AOAI)

+
VISUAL_MODE: True, # Whether to use the visual mode
+API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.  
+API_BASE: "YOUR_ENDPOINT", #  The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
+API_KEY: "YOUR_KEY",  # The aoai API key
+API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview",  # The OpenAI model
+API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
+
+

You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False and proper API_MODEL (openai) and API_DEPLOYMENT_ID (aoai). You can also optionally set an backup LLM engine in the field of BACKUP_AGENT if the above engines failed during the inference. The API_MODEL can be any GPT models that can accept images as input.

+

Non-Visual Model Configuration

+

You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file:

+
+

Info

+
    +
  • VISUAL_MODE: False
  • +
  • Specify the appropriate API_MODEL (OpenAI) and API_DEPLOYMENT_ID (AOAI) for each agent.
  • +
+
+

Optionally, you can set a backup language model (LLM) engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively.

+
+

Note

+

UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the documents for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in config_dev.yaml.

+
+

📔 Step 3: Additional Setting for RAG (optional).

+

If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the ufo/config/config.yaml file.

+

We provide the following options for RAG to enhance UFO's capabilities:

+ +
+

Tip

+

Consult their respective documentation for more information on how to configure these settings.

+
+

🎉 Step 4: Start UFO

+

⌨️ You can execute the following on your Windows command Line (CLI):

+
# assume you are in the cloned UFO folder
+python -m ufo --task <your_task_name>
+
+

This will start the UFO process and you can interact with it through the command line interface. +If everything goes well, you will see the following message:

+
Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction. 
+ _   _  _____   ___
+| | | ||  ___| / _ \
+| | | || |_   | | | |
+| |_| ||  _|  | |_| |
+ \___/ |_|     \___/
+Please enter your request to be completed🛸:
+
+

Step 5 🎥: Execution Logs

+

You can find the screenshots taken and request & response logs in the following folder:

+
./ufo/logs/<your_task_name>/
+
+

You may use them to debug, replay, or analyze the agent output.

+
+

Note

+

Before UFO executing your request, please make sure the targeted applications are active on the system.

+
+
+

Note

+

The GPT-V accepts screenshots of your desktop and application GUI as input. Please ensure that no sensitive or confidential information is visible or captured during the execution process. For further information, refer to DISCLAIMER.md.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/img/action_step2.png b/img/action_step2.png new file mode 100644 index 00000000..0a8d93ab Binary files /dev/null and b/img/action_step2.png differ diff --git a/img/action_step2_annotated.png b/img/action_step2_annotated.png new file mode 100644 index 00000000..b3edb8da Binary files /dev/null and b/img/action_step2_annotated.png differ diff --git a/img/action_step2_concat.png b/img/action_step2_concat.png new file mode 100644 index 00000000..fba26a1b Binary files /dev/null and b/img/action_step2_concat.png differ diff --git a/img/action_step2_selected_controls.png b/img/action_step2_selected_controls.png new file mode 100644 index 00000000..f2d0e11d Binary files /dev/null and b/img/action_step2_selected_controls.png differ diff --git a/img/add_comment.png b/img/add_comment.png new file mode 100644 index 00000000..41104293 Binary files /dev/null and b/img/add_comment.png differ diff --git a/img/app_agent_creation.png b/img/app_agent_creation.png new file mode 100644 index 00000000..393d4556 Binary files /dev/null and b/img/app_agent_creation.png differ diff --git a/img/app_state_machine.png b/img/app_state_machine.png new file mode 100644 index 00000000..64f0a60f Binary files /dev/null and b/img/app_state_machine.png differ diff --git a/img/appagent.png b/img/appagent.png new file mode 100644 index 00000000..83b078e4 Binary files /dev/null and b/img/appagent.png differ diff --git a/img/blackboard.png b/img/blackboard.png new file mode 100644 index 00000000..60e7da1c Binary files /dev/null and b/img/blackboard.png differ diff --git a/img/desomposition.png b/img/desomposition.png new file mode 100644 index 00000000..6fa7b9d1 Binary files /dev/null and b/img/desomposition.png differ diff --git a/img/eva.png b/img/eva.png new file mode 100644 index 00000000..a1213603 Binary files /dev/null and b/img/eva.png differ diff --git a/img/execution.png b/img/execution.png new file mode 100644 index 00000000..1d962afb Binary files /dev/null and b/img/execution.png differ diff --git a/img/favicon.ico b/img/favicon.ico new file mode 100644 index 00000000..d26f6942 Binary files /dev/null and b/img/favicon.ico differ diff --git a/img/framework.png b/img/framework.png new file mode 100644 index 00000000..a788e4d8 Binary files /dev/null and b/img/framework.png differ diff --git a/img/framework_v2.png b/img/framework_v2.png new file mode 100644 index 00000000..4438d51d Binary files /dev/null and b/img/framework_v2.png differ diff --git a/img/gui_agent.png b/img/gui_agent.png new file mode 100644 index 00000000..9bd0599d Binary files /dev/null and b/img/gui_agent.png differ diff --git a/img/host_state_machine.png b/img/host_state_machine.png new file mode 100644 index 00000000..e4ec63be Binary files /dev/null and b/img/host_state_machine.png differ diff --git a/img/instantiation.png b/img/instantiation.png new file mode 100644 index 00000000..f975178f Binary files /dev/null and b/img/instantiation.png differ diff --git a/img/overview.png b/img/overview.png new file mode 100644 index 00000000..abc3c6eb Binary files /dev/null and b/img/overview.png differ diff --git a/img/overview_n.png b/img/overview_n.png new file mode 100644 index 00000000..48b85772 Binary files /dev/null and b/img/overview_n.png differ diff --git a/img/result_example.png b/img/result_example.png new file mode 100644 index 00000000..accaf5bf Binary files /dev/null and b/img/result_example.png differ diff --git a/img/save_ask.png b/img/save_ask.png new file mode 100644 index 00000000..e22ee59b Binary files /dev/null and b/img/save_ask.png differ diff --git a/img/screenshots.png b/img/screenshots.png new file mode 100644 index 00000000..18048e9d Binary files /dev/null and b/img/screenshots.png differ diff --git a/img/session.png b/img/session.png new file mode 100644 index 00000000..e2e00f26 Binary files /dev/null and b/img/session.png differ diff --git a/img/ufo.png b/img/ufo.png new file mode 100644 index 00000000..592b9a5a Binary files /dev/null and b/img/ufo.png differ diff --git a/img/ufo_blue.png b/img/ufo_blue.png new file mode 100644 index 00000000..1feba09d Binary files /dev/null and b/img/ufo_blue.png differ diff --git a/img/webpage.png b/img/webpage.png new file mode 100644 index 00000000..e7180629 Binary files /dev/null and b/img/webpage.png differ diff --git a/index.html b/index.html new file mode 100644 index 00000000..adfbab3f --- /dev/null +++ b/index.html @@ -0,0 +1,422 @@ + + + + + + + + UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + +
  • +
  • +
+
+
+
+
+ +

Welcome to UFO's Document!

+

arxiv  +Python Version  +License: MIT  +github  +YouTube

+

Introduction

+

UFO is a UI-Focused multi-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.

+

+ +

+ +

🕌 Framework

+

UFO UFO Image operates as a multi-agent framework, encompassing:

+
    +
  • +

    HostAgent 🤖, tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application.

    +
  • +
  • +

    AppAgent 👾, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.

    +
  • +
  • +

    Application Automator 🎮, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details here.

    +
  • +
+

Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report.

+

+ +

+ +

🚀 Quick Start

+

Please follow the Quick Start Guide to get started with UFO.

+

💥 Highlights

+
    +
  • First Windows Agent - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS.
  • +
  • Agent as an Expert - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including offline help documents, online search engines, and human demonstrations, making the agent an application "expert".
  • +
  • Rich Skill Set - UFO is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native API, and "Copilot".
  • +
  • Interactive Mode - UFO facilitates multiple sub-requests from users within the same session, enabling the seamless completion of complex tasks.
  • +
  • Agent Customization - UFO allows users to customize their own agents by providing additional information. The agent will proactively query users for details when necessary to better tailor its behavior.
  • +
  • Scalable AppAgent Creation - UFO offers extensibility, allowing users and app developers to create their own AppAgents in an easy and scalable way.
  • +
+

🌐 Media Coverage

+

Check out our official deep dive of UFO on this Youtube Video.

+

UFO sightings have garnered attention from various media outlets, including:

+ +

❓Get help

+ +
+

 

+

📚 Citation

+

Our technical report paper can be found here. Note that previous HostAgent and AppAgent in the paper are renamed to HostAgent and AppAgent in the code base to better reflect their functions. +If you use UFO in your research, please cite our paper:

+
@article{ufo,
+  title={{UFO: A UI-Focused Agent for Windows OS Interaction}},
+  author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and  Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
+  journal={arXiv preprint arXiv:2402.07939},
+  year={2024}
+}
+
+ +

If you're interested in data analytics agent frameworks, check out TaskWeaver, a code-first LLM agent framework designed for seamlessly planning and executing data analytics tasks.

+

For more information on GUI agents, refer to our survey paper: Large Language Model-Brained GUI Agents: A Survey. You can also explore the survey through: +- GitHub Repository +- Searchable Website

+ + + + +
+
+ +
+
+ +
+ +
+ +
+ + + + + Next » + + +
+ + + + + + + + + + + diff --git a/js/html5shiv.min.js b/js/html5shiv.min.js new file mode 100644 index 00000000..1a01c94b --- /dev/null +++ b/js/html5shiv.min.js @@ -0,0 +1,4 @@ +/** +* @preserve HTML5 Shiv 3.7.3 | @afarkas @jdalton @jon_neal @rem | MIT/GPL2 Licensed +*/ +!function(a,b){function c(a,b){var c=a.createElement("p"),d=a.getElementsByTagName("head")[0]||a.documentElement;return c.innerHTML="x",d.insertBefore(c.lastChild,d.firstChild)}function d(){var a=t.elements;return"string"==typeof a?a.split(" "):a}function e(a,b){var c=t.elements;"string"!=typeof c&&(c=c.join(" ")),"string"!=typeof a&&(a=a.join(" ")),t.elements=c+" "+a,j(b)}function f(a){var b=s[a[q]];return b||(b={},r++,a[q]=r,s[r]=b),b}function g(a,c,d){if(c||(c=b),l)return c.createElement(a);d||(d=f(c));var e;return e=d.cache[a]?d.cache[a].cloneNode():p.test(a)?(d.cache[a]=d.createElem(a)).cloneNode():d.createElem(a),!e.canHaveChildren||o.test(a)||e.tagUrn?e:d.frag.appendChild(e)}function h(a,c){if(a||(a=b),l)return a.createDocumentFragment();c=c||f(a);for(var e=c.frag.cloneNode(),g=0,h=d(),i=h.length;i>g;g++)e.createElement(h[g]);return e}function i(a,b){b.cache||(b.cache={},b.createElem=a.createElement,b.createFrag=a.createDocumentFragment,b.frag=b.createFrag()),a.createElement=function(c){return t.shivMethods?g(c,a,b):b.createElem(c)},a.createDocumentFragment=Function("h,f","return function(){var n=f.cloneNode(),c=n.createElement;h.shivMethods&&("+d().join().replace(/[\w\-:]+/g,function(a){return b.createElem(a),b.frag.createElement(a),'c("'+a+'")'})+");return n}")(t,b.frag)}function j(a){a||(a=b);var d=f(a);return!t.shivCSS||k||d.hasCSS||(d.hasCSS=!!c(a,"article,aside,dialog,figcaption,figure,footer,header,hgroup,main,nav,section{display:block}mark{background:#FF0;color:#000}template{display:none}")),l||i(a,d),a}var k,l,m="3.7.3",n=a.html5||{},o=/^<|^(?:button|map|select|textarea|object|iframe|option|optgroup)$/i,p=/^(?:a|b|code|div|fieldset|h1|h2|h3|h4|h5|h6|i|label|li|ol|p|q|span|strong|style|table|tbody|td|th|tr|ul)$/i,q="_html5shiv",r=0,s={};!function(){try{var a=b.createElement("a");a.innerHTML="",k="hidden"in a,l=1==a.childNodes.length||function(){b.createElement("a");var a=b.createDocumentFragment();return"undefined"==typeof a.cloneNode||"undefined"==typeof a.createDocumentFragment||"undefined"==typeof a.createElement}()}catch(c){k=!0,l=!0}}();var t={elements:n.elements||"abbr article aside audio bdi canvas data datalist details dialog figcaption figure footer header hgroup main mark meter nav output picture progress section summary template time video",version:m,shivCSS:n.shivCSS!==!1,supportsUnknownElements:l,shivMethods:n.shivMethods!==!1,type:"default",shivDocument:j,createElement:g,createDocumentFragment:h,addElements:e};a.html5=t,j(b),"object"==typeof module&&module.exports&&(module.exports=t)}("undefined"!=typeof window?window:this,document); diff --git a/js/jquery-3.6.0.min.js b/js/jquery-3.6.0.min.js new file mode 100644 index 00000000..c4c6022f --- /dev/null +++ b/js/jquery-3.6.0.min.js @@ -0,0 +1,2 @@ +/*! jQuery v3.6.0 | (c) OpenJS Foundation and other contributors | jquery.org/license */ +!function(e,t){"use strict";"object"==typeof module&&"object"==typeof module.exports?module.exports=e.document?t(e,!0):function(e){if(!e.document)throw new Error("jQuery requires a window with a document");return t(e)}:t(e)}("undefined"!=typeof window?window:this,function(C,e){"use strict";var t=[],r=Object.getPrototypeOf,s=t.slice,g=t.flat?function(e){return t.flat.call(e)}:function(e){return t.concat.apply([],e)},u=t.push,i=t.indexOf,n={},o=n.toString,v=n.hasOwnProperty,a=v.toString,l=a.call(Object),y={},m=function(e){return"function"==typeof e&&"number"!=typeof e.nodeType&&"function"!=typeof e.item},x=function(e){return null!=e&&e===e.window},E=C.document,c={type:!0,src:!0,nonce:!0,noModule:!0};function b(e,t,n){var r,i,o=(n=n||E).createElement("script");if(o.text=e,t)for(r in c)(i=t[r]||t.getAttribute&&t.getAttribute(r))&&o.setAttribute(r,i);n.head.appendChild(o).parentNode.removeChild(o)}function w(e){return null==e?e+"":"object"==typeof e||"function"==typeof e?n[o.call(e)]||"object":typeof e}var f="3.6.0",S=function(e,t){return new S.fn.init(e,t)};function p(e){var t=!!e&&"length"in e&&e.length,n=w(e);return!m(e)&&!x(e)&&("array"===n||0===t||"number"==typeof t&&0+~]|"+M+")"+M+"*"),U=new RegExp(M+"|>"),X=new RegExp(F),V=new RegExp("^"+I+"$"),G={ID:new RegExp("^#("+I+")"),CLASS:new RegExp("^\\.("+I+")"),TAG:new RegExp("^("+I+"|[*])"),ATTR:new RegExp("^"+W),PSEUDO:new RegExp("^"+F),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\("+M+"*(even|odd|(([+-]|)(\\d*)n|)"+M+"*(?:([+-]|)"+M+"*(\\d+)|))"+M+"*\\)|)","i"),bool:new RegExp("^(?:"+R+")$","i"),needsContext:new RegExp("^"+M+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\("+M+"*((?:-\\d)?\\d*)"+M+"*\\)|)(?=[^-]|$)","i")},Y=/HTML$/i,Q=/^(?:input|select|textarea|button)$/i,J=/^h\d$/i,K=/^[^{]+\{\s*\[native \w/,Z=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,ee=/[+~]/,te=new RegExp("\\\\[\\da-fA-F]{1,6}"+M+"?|\\\\([^\\r\\n\\f])","g"),ne=function(e,t){var n="0x"+e.slice(1)-65536;return t||(n<0?String.fromCharCode(n+65536):String.fromCharCode(n>>10|55296,1023&n|56320))},re=/([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g,ie=function(e,t){return t?"\0"===e?"\ufffd":e.slice(0,-1)+"\\"+e.charCodeAt(e.length-1).toString(16)+" ":"\\"+e},oe=function(){T()},ae=be(function(e){return!0===e.disabled&&"fieldset"===e.nodeName.toLowerCase()},{dir:"parentNode",next:"legend"});try{H.apply(t=O.call(p.childNodes),p.childNodes),t[p.childNodes.length].nodeType}catch(e){H={apply:t.length?function(e,t){L.apply(e,O.call(t))}:function(e,t){var n=e.length,r=0;while(e[n++]=t[r++]);e.length=n-1}}}function se(t,e,n,r){var i,o,a,s,u,l,c,f=e&&e.ownerDocument,p=e?e.nodeType:9;if(n=n||[],"string"!=typeof t||!t||1!==p&&9!==p&&11!==p)return n;if(!r&&(T(e),e=e||C,E)){if(11!==p&&(u=Z.exec(t)))if(i=u[1]){if(9===p){if(!(a=e.getElementById(i)))return n;if(a.id===i)return n.push(a),n}else if(f&&(a=f.getElementById(i))&&y(e,a)&&a.id===i)return n.push(a),n}else{if(u[2])return H.apply(n,e.getElementsByTagName(t)),n;if((i=u[3])&&d.getElementsByClassName&&e.getElementsByClassName)return H.apply(n,e.getElementsByClassName(i)),n}if(d.qsa&&!N[t+" "]&&(!v||!v.test(t))&&(1!==p||"object"!==e.nodeName.toLowerCase())){if(c=t,f=e,1===p&&(U.test(t)||z.test(t))){(f=ee.test(t)&&ye(e.parentNode)||e)===e&&d.scope||((s=e.getAttribute("id"))?s=s.replace(re,ie):e.setAttribute("id",s=S)),o=(l=h(t)).length;while(o--)l[o]=(s?"#"+s:":scope")+" "+xe(l[o]);c=l.join(",")}try{return H.apply(n,f.querySelectorAll(c)),n}catch(e){N(t,!0)}finally{s===S&&e.removeAttribute("id")}}}return g(t.replace($,"$1"),e,n,r)}function ue(){var r=[];return function e(t,n){return r.push(t+" ")>b.cacheLength&&delete e[r.shift()],e[t+" "]=n}}function le(e){return e[S]=!0,e}function ce(e){var t=C.createElement("fieldset");try{return!!e(t)}catch(e){return!1}finally{t.parentNode&&t.parentNode.removeChild(t),t=null}}function fe(e,t){var n=e.split("|"),r=n.length;while(r--)b.attrHandle[n[r]]=t}function pe(e,t){var n=t&&e,r=n&&1===e.nodeType&&1===t.nodeType&&e.sourceIndex-t.sourceIndex;if(r)return r;if(n)while(n=n.nextSibling)if(n===t)return-1;return e?1:-1}function de(t){return function(e){return"input"===e.nodeName.toLowerCase()&&e.type===t}}function he(n){return function(e){var t=e.nodeName.toLowerCase();return("input"===t||"button"===t)&&e.type===n}}function ge(t){return function(e){return"form"in e?e.parentNode&&!1===e.disabled?"label"in e?"label"in e.parentNode?e.parentNode.disabled===t:e.disabled===t:e.isDisabled===t||e.isDisabled!==!t&&ae(e)===t:e.disabled===t:"label"in e&&e.disabled===t}}function ve(a){return le(function(o){return o=+o,le(function(e,t){var n,r=a([],e.length,o),i=r.length;while(i--)e[n=r[i]]&&(e[n]=!(t[n]=e[n]))})})}function ye(e){return e&&"undefined"!=typeof e.getElementsByTagName&&e}for(e in d=se.support={},i=se.isXML=function(e){var t=e&&e.namespaceURI,n=e&&(e.ownerDocument||e).documentElement;return!Y.test(t||n&&n.nodeName||"HTML")},T=se.setDocument=function(e){var t,n,r=e?e.ownerDocument||e:p;return r!=C&&9===r.nodeType&&r.documentElement&&(a=(C=r).documentElement,E=!i(C),p!=C&&(n=C.defaultView)&&n.top!==n&&(n.addEventListener?n.addEventListener("unload",oe,!1):n.attachEvent&&n.attachEvent("onunload",oe)),d.scope=ce(function(e){return a.appendChild(e).appendChild(C.createElement("div")),"undefined"!=typeof e.querySelectorAll&&!e.querySelectorAll(":scope fieldset div").length}),d.attributes=ce(function(e){return e.className="i",!e.getAttribute("className")}),d.getElementsByTagName=ce(function(e){return e.appendChild(C.createComment("")),!e.getElementsByTagName("*").length}),d.getElementsByClassName=K.test(C.getElementsByClassName),d.getById=ce(function(e){return a.appendChild(e).id=S,!C.getElementsByName||!C.getElementsByName(S).length}),d.getById?(b.filter.ID=function(e){var t=e.replace(te,ne);return function(e){return e.getAttribute("id")===t}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n=t.getElementById(e);return n?[n]:[]}}):(b.filter.ID=function(e){var n=e.replace(te,ne);return function(e){var t="undefined"!=typeof e.getAttributeNode&&e.getAttributeNode("id");return t&&t.value===n}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&E){var n,r,i,o=t.getElementById(e);if(o){if((n=o.getAttributeNode("id"))&&n.value===e)return[o];i=t.getElementsByName(e),r=0;while(o=i[r++])if((n=o.getAttributeNode("id"))&&n.value===e)return[o]}return[]}}),b.find.TAG=d.getElementsByTagName?function(e,t){return"undefined"!=typeof t.getElementsByTagName?t.getElementsByTagName(e):d.qsa?t.querySelectorAll(e):void 0}:function(e,t){var n,r=[],i=0,o=t.getElementsByTagName(e);if("*"===e){while(n=o[i++])1===n.nodeType&&r.push(n);return r}return o},b.find.CLASS=d.getElementsByClassName&&function(e,t){if("undefined"!=typeof t.getElementsByClassName&&E)return t.getElementsByClassName(e)},s=[],v=[],(d.qsa=K.test(C.querySelectorAll))&&(ce(function(e){var t;a.appendChild(e).innerHTML="",e.querySelectorAll("[msallowcapture^='']").length&&v.push("[*^$]="+M+"*(?:''|\"\")"),e.querySelectorAll("[selected]").length||v.push("\\["+M+"*(?:value|"+R+")"),e.querySelectorAll("[id~="+S+"-]").length||v.push("~="),(t=C.createElement("input")).setAttribute("name",""),e.appendChild(t),e.querySelectorAll("[name='']").length||v.push("\\["+M+"*name"+M+"*="+M+"*(?:''|\"\")"),e.querySelectorAll(":checked").length||v.push(":checked"),e.querySelectorAll("a#"+S+"+*").length||v.push(".#.+[+~]"),e.querySelectorAll("\\\f"),v.push("[\\r\\n\\f]")}),ce(function(e){e.innerHTML="";var t=C.createElement("input");t.setAttribute("type","hidden"),e.appendChild(t).setAttribute("name","D"),e.querySelectorAll("[name=d]").length&&v.push("name"+M+"*[*^$|!~]?="),2!==e.querySelectorAll(":enabled").length&&v.push(":enabled",":disabled"),a.appendChild(e).disabled=!0,2!==e.querySelectorAll(":disabled").length&&v.push(":enabled",":disabled"),e.querySelectorAll("*,:x"),v.push(",.*:")})),(d.matchesSelector=K.test(c=a.matches||a.webkitMatchesSelector||a.mozMatchesSelector||a.oMatchesSelector||a.msMatchesSelector))&&ce(function(e){d.disconnectedMatch=c.call(e,"*"),c.call(e,"[s!='']:x"),s.push("!=",F)}),v=v.length&&new RegExp(v.join("|")),s=s.length&&new RegExp(s.join("|")),t=K.test(a.compareDocumentPosition),y=t||K.test(a.contains)?function(e,t){var n=9===e.nodeType?e.documentElement:e,r=t&&t.parentNode;return e===r||!(!r||1!==r.nodeType||!(n.contains?n.contains(r):e.compareDocumentPosition&&16&e.compareDocumentPosition(r)))}:function(e,t){if(t)while(t=t.parentNode)if(t===e)return!0;return!1},j=t?function(e,t){if(e===t)return l=!0,0;var n=!e.compareDocumentPosition-!t.compareDocumentPosition;return n||(1&(n=(e.ownerDocument||e)==(t.ownerDocument||t)?e.compareDocumentPosition(t):1)||!d.sortDetached&&t.compareDocumentPosition(e)===n?e==C||e.ownerDocument==p&&y(p,e)?-1:t==C||t.ownerDocument==p&&y(p,t)?1:u?P(u,e)-P(u,t):0:4&n?-1:1)}:function(e,t){if(e===t)return l=!0,0;var n,r=0,i=e.parentNode,o=t.parentNode,a=[e],s=[t];if(!i||!o)return e==C?-1:t==C?1:i?-1:o?1:u?P(u,e)-P(u,t):0;if(i===o)return pe(e,t);n=e;while(n=n.parentNode)a.unshift(n);n=t;while(n=n.parentNode)s.unshift(n);while(a[r]===s[r])r++;return r?pe(a[r],s[r]):a[r]==p?-1:s[r]==p?1:0}),C},se.matches=function(e,t){return se(e,null,null,t)},se.matchesSelector=function(e,t){if(T(e),d.matchesSelector&&E&&!N[t+" "]&&(!s||!s.test(t))&&(!v||!v.test(t)))try{var n=c.call(e,t);if(n||d.disconnectedMatch||e.document&&11!==e.document.nodeType)return n}catch(e){N(t,!0)}return 0":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(e){return e[1]=e[1].replace(te,ne),e[3]=(e[3]||e[4]||e[5]||"").replace(te,ne),"~="===e[2]&&(e[3]=" "+e[3]+" "),e.slice(0,4)},CHILD:function(e){return e[1]=e[1].toLowerCase(),"nth"===e[1].slice(0,3)?(e[3]||se.error(e[0]),e[4]=+(e[4]?e[5]+(e[6]||1):2*("even"===e[3]||"odd"===e[3])),e[5]=+(e[7]+e[8]||"odd"===e[3])):e[3]&&se.error(e[0]),e},PSEUDO:function(e){var t,n=!e[6]&&e[2];return G.CHILD.test(e[0])?null:(e[3]?e[2]=e[4]||e[5]||"":n&&X.test(n)&&(t=h(n,!0))&&(t=n.indexOf(")",n.length-t)-n.length)&&(e[0]=e[0].slice(0,t),e[2]=n.slice(0,t)),e.slice(0,3))}},filter:{TAG:function(e){var t=e.replace(te,ne).toLowerCase();return"*"===e?function(){return!0}:function(e){return e.nodeName&&e.nodeName.toLowerCase()===t}},CLASS:function(e){var t=m[e+" "];return t||(t=new RegExp("(^|"+M+")"+e+"("+M+"|$)"))&&m(e,function(e){return t.test("string"==typeof e.className&&e.className||"undefined"!=typeof e.getAttribute&&e.getAttribute("class")||"")})},ATTR:function(n,r,i){return function(e){var t=se.attr(e,n);return null==t?"!="===r:!r||(t+="","="===r?t===i:"!="===r?t!==i:"^="===r?i&&0===t.indexOf(i):"*="===r?i&&-1:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i;function j(e,n,r){return m(n)?S.grep(e,function(e,t){return!!n.call(e,t,e)!==r}):n.nodeType?S.grep(e,function(e){return e===n!==r}):"string"!=typeof n?S.grep(e,function(e){return-1)[^>]*|#([\w-]+))$/;(S.fn.init=function(e,t,n){var r,i;if(!e)return this;if(n=n||D,"string"==typeof e){if(!(r="<"===e[0]&&">"===e[e.length-1]&&3<=e.length?[null,e,null]:q.exec(e))||!r[1]&&t)return!t||t.jquery?(t||n).find(e):this.constructor(t).find(e);if(r[1]){if(t=t instanceof S?t[0]:t,S.merge(this,S.parseHTML(r[1],t&&t.nodeType?t.ownerDocument||t:E,!0)),N.test(r[1])&&S.isPlainObject(t))for(r in t)m(this[r])?this[r](t[r]):this.attr(r,t[r]);return this}return(i=E.getElementById(r[2]))&&(this[0]=i,this.length=1),this}return e.nodeType?(this[0]=e,this.length=1,this):m(e)?void 0!==n.ready?n.ready(e):e(S):S.makeArray(e,this)}).prototype=S.fn,D=S(E);var L=/^(?:parents|prev(?:Until|All))/,H={children:!0,contents:!0,next:!0,prev:!0};function O(e,t){while((e=e[t])&&1!==e.nodeType);return e}S.fn.extend({has:function(e){var t=S(e,this),n=t.length;return this.filter(function(){for(var e=0;e\x20\t\r\n\f]*)/i,he=/^$|^module$|\/(?:java|ecma)script/i;ce=E.createDocumentFragment().appendChild(E.createElement("div")),(fe=E.createElement("input")).setAttribute("type","radio"),fe.setAttribute("checked","checked"),fe.setAttribute("name","t"),ce.appendChild(fe),y.checkClone=ce.cloneNode(!0).cloneNode(!0).lastChild.checked,ce.innerHTML="",y.noCloneChecked=!!ce.cloneNode(!0).lastChild.defaultValue,ce.innerHTML="",y.option=!!ce.lastChild;var ge={thead:[1,"","
"],col:[2,"","
"],tr:[2,"","
"],td:[3,"","
"],_default:[0,"",""]};function ve(e,t){var n;return n="undefined"!=typeof e.getElementsByTagName?e.getElementsByTagName(t||"*"):"undefined"!=typeof e.querySelectorAll?e.querySelectorAll(t||"*"):[],void 0===t||t&&A(e,t)?S.merge([e],n):n}function ye(e,t){for(var n=0,r=e.length;n",""]);var me=/<|&#?\w+;/;function xe(e,t,n,r,i){for(var o,a,s,u,l,c,f=t.createDocumentFragment(),p=[],d=0,h=e.length;d\s*$/g;function je(e,t){return A(e,"table")&&A(11!==t.nodeType?t:t.firstChild,"tr")&&S(e).children("tbody")[0]||e}function De(e){return e.type=(null!==e.getAttribute("type"))+"/"+e.type,e}function qe(e){return"true/"===(e.type||"").slice(0,5)?e.type=e.type.slice(5):e.removeAttribute("type"),e}function Le(e,t){var n,r,i,o,a,s;if(1===t.nodeType){if(Y.hasData(e)&&(s=Y.get(e).events))for(i in Y.remove(t,"handle events"),s)for(n=0,r=s[i].length;n").attr(n.scriptAttrs||{}).prop({charset:n.scriptCharset,src:n.url}).on("load error",i=function(e){r.remove(),i=null,e&&t("error"===e.type?404:200,e.type)}),E.head.appendChild(r[0])},abort:function(){i&&i()}}});var _t,zt=[],Ut=/(=)\?(?=&|$)|\?\?/;S.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var e=zt.pop()||S.expando+"_"+wt.guid++;return this[e]=!0,e}}),S.ajaxPrefilter("json jsonp",function(e,t,n){var r,i,o,a=!1!==e.jsonp&&(Ut.test(e.url)?"url":"string"==typeof e.data&&0===(e.contentType||"").indexOf("application/x-www-form-urlencoded")&&Ut.test(e.data)&&"data");if(a||"jsonp"===e.dataTypes[0])return r=e.jsonpCallback=m(e.jsonpCallback)?e.jsonpCallback():e.jsonpCallback,a?e[a]=e[a].replace(Ut,"$1"+r):!1!==e.jsonp&&(e.url+=(Tt.test(e.url)?"&":"?")+e.jsonp+"="+r),e.converters["script json"]=function(){return o||S.error(r+" was not called"),o[0]},e.dataTypes[0]="json",i=C[r],C[r]=function(){o=arguments},n.always(function(){void 0===i?S(C).removeProp(r):C[r]=i,e[r]&&(e.jsonpCallback=t.jsonpCallback,zt.push(r)),o&&m(i)&&i(o[0]),o=i=void 0}),"script"}),y.createHTMLDocument=((_t=E.implementation.createHTMLDocument("").body).innerHTML="
",2===_t.childNodes.length),S.parseHTML=function(e,t,n){return"string"!=typeof e?[]:("boolean"==typeof t&&(n=t,t=!1),t||(y.createHTMLDocument?((r=(t=E.implementation.createHTMLDocument("")).createElement("base")).href=E.location.href,t.head.appendChild(r)):t=E),o=!n&&[],(i=N.exec(e))?[t.createElement(i[1])]:(i=xe([e],t,o),o&&o.length&&S(o).remove(),S.merge([],i.childNodes)));var r,i,o},S.fn.load=function(e,t,n){var r,i,o,a=this,s=e.indexOf(" ");return-1").append(S.parseHTML(e)).find(r):e)}).always(n&&function(e,t){a.each(function(){n.apply(this,o||[e.responseText,t,e])})}),this},S.expr.pseudos.animated=function(t){return S.grep(S.timers,function(e){return t===e.elem}).length},S.offset={setOffset:function(e,t,n){var r,i,o,a,s,u,l=S.css(e,"position"),c=S(e),f={};"static"===l&&(e.style.position="relative"),s=c.offset(),o=S.css(e,"top"),u=S.css(e,"left"),("absolute"===l||"fixed"===l)&&-1<(o+u).indexOf("auto")?(a=(r=c.position()).top,i=r.left):(a=parseFloat(o)||0,i=parseFloat(u)||0),m(t)&&(t=t.call(e,n,S.extend({},s))),null!=t.top&&(f.top=t.top-s.top+a),null!=t.left&&(f.left=t.left-s.left+i),"using"in t?t.using.call(e,f):c.css(f)}},S.fn.extend({offset:function(t){if(arguments.length)return void 0===t?this:this.each(function(e){S.offset.setOffset(this,t,e)});var e,n,r=this[0];return r?r.getClientRects().length?(e=r.getBoundingClientRect(),n=r.ownerDocument.defaultView,{top:e.top+n.pageYOffset,left:e.left+n.pageXOffset}):{top:0,left:0}:void 0},position:function(){if(this[0]){var e,t,n,r=this[0],i={top:0,left:0};if("fixed"===S.css(r,"position"))t=r.getBoundingClientRect();else{t=this.offset(),n=r.ownerDocument,e=r.offsetParent||n.documentElement;while(e&&(e===n.body||e===n.documentElement)&&"static"===S.css(e,"position"))e=e.parentNode;e&&e!==r&&1===e.nodeType&&((i=S(e).offset()).top+=S.css(e,"borderTopWidth",!0),i.left+=S.css(e,"borderLeftWidth",!0))}return{top:t.top-i.top-S.css(r,"marginTop",!0),left:t.left-i.left-S.css(r,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var e=this.offsetParent;while(e&&"static"===S.css(e,"position"))e=e.offsetParent;return e||re})}}),S.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(t,i){var o="pageYOffset"===i;S.fn[t]=function(e){return $(this,function(e,t,n){var r;if(x(e)?r=e:9===e.nodeType&&(r=e.defaultView),void 0===n)return r?r[i]:e[t];r?r.scrollTo(o?r.pageXOffset:n,o?n:r.pageYOffset):e[t]=n},t,e,arguments.length)}}),S.each(["top","left"],function(e,n){S.cssHooks[n]=Fe(y.pixelPosition,function(e,t){if(t)return t=We(e,n),Pe.test(t)?S(e).position()[n]+"px":t})}),S.each({Height:"height",Width:"width"},function(a,s){S.each({padding:"inner"+a,content:s,"":"outer"+a},function(r,o){S.fn[o]=function(e,t){var n=arguments.length&&(r||"boolean"!=typeof e),i=r||(!0===e||!0===t?"margin":"border");return $(this,function(e,t,n){var r;return x(e)?0===o.indexOf("outer")?e["inner"+a]:e.document.documentElement["client"+a]:9===e.nodeType?(r=e.documentElement,Math.max(e.body["scroll"+a],r["scroll"+a],e.body["offset"+a],r["offset"+a],r["client"+a])):void 0===n?S.css(e,t,i):S.style(e,t,n,i)},s,n?e:void 0,n)}})}),S.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(e,t){S.fn[t]=function(e){return this.on(t,e)}}),S.fn.extend({bind:function(e,t,n){return this.on(e,null,t,n)},unbind:function(e,t){return this.off(e,null,t)},delegate:function(e,t,n,r){return this.on(t,e,n,r)},undelegate:function(e,t,n){return 1===arguments.length?this.off(e,"**"):this.off(t,e||"**",n)},hover:function(e,t){return this.mouseenter(e).mouseleave(t||e)}}),S.each("blur focus focusin focusout resize scroll click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup contextmenu".split(" "),function(e,n){S.fn[n]=function(e,t){return 0"),n("table.docutils.footnote").wrap("
"),n("table.docutils.citation").wrap("
"),n(".wy-menu-vertical ul").not(".simple").siblings("a").each((function(){var t=n(this);expand=n(''),expand.on("click",(function(n){return e.toggleCurrent(t),n.stopPropagation(),!1})),t.prepend(expand)}))},reset:function(){var n=encodeURI(window.location.hash)||"#";try{var e=$(".wy-menu-vertical"),t=e.find('[href="'+n+'"]');if(0===t.length){var i=$('.document [id="'+n.substring(1)+'"]').closest("div.section");0===(t=e.find('[href="#'+i.attr("id")+'"]')).length&&(t=e.find('[href="#"]'))}if(t.length>0){$(".wy-menu-vertical .current").removeClass("current").attr("aria-expanded","false"),t.addClass("current").attr("aria-expanded","true"),t.closest("li.toctree-l1").parent().addClass("current").attr("aria-expanded","true");for(let n=1;n<=10;n++)t.closest("li.toctree-l"+n).addClass("current").attr("aria-expanded","true");t[0].scrollIntoView()}}catch(n){console.log("Error expanding nav for anchor",n)}},onScroll:function(){this.winScroll=!1;var n=this.win.scrollTop(),e=n+this.winHeight,t=this.navBar.scrollTop()+(n-this.winPosition);n<0||e>this.docHeight||(this.navBar.scrollTop(t),this.winPosition=n)},onResize:function(){this.winResize=!1,this.winHeight=this.win.height(),this.docHeight=$(document).height()},hashChange:function(){this.linkScroll=!0,this.win.one("hashchange",(function(){this.linkScroll=!1}))},toggleCurrent:function(n){var e=n.closest("li");e.siblings("li.current").removeClass("current").attr("aria-expanded","false"),e.siblings().find("li.current").removeClass("current").attr("aria-expanded","false");var t=e.find("> ul li");t.length&&(t.removeClass("current").attr("aria-expanded","false"),e.toggleClass("current").attr("aria-expanded",(function(n,e){return"true"==e?"false":"true"})))}},"undefined"!=typeof window&&(window.SphinxRtdTheme={Navigation:n.exports.ThemeNav,StickyNav:n.exports.ThemeNav}),function(){for(var n=0,e=["ms","moz","webkit","o"],t=0;t + + + + + + + Evaluation Logs - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Evaluation Logs

+

The evaluation logs store the evaluation results from the EvaluationAgent. The evaluation log contains the following information:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescriptionType
ReasonThe detailed reason for your judgment, by observing the screenshot differences and the .String
Sub-scoreThe sub-score of the evaluation in decomposing the evaluation into multiple sub-goals.List of Dictionaries
CompleteThe completion status of the evaluation, can be yes, no, or unsure.String
levelThe level of the evaluation.String
requestThe request sent to the EvaluationAgent.Dictionary
idThe ID of the evaluation.Integer
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/logs/overview/index.html b/logs/overview/index.html new file mode 100644 index 00000000..8d43189f --- /dev/null +++ b/logs/overview/index.html @@ -0,0 +1,349 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

UFO Logs

+

Logs are essential for debugging and understanding the behavior of the UFO framework. There are three types of logs generated by UFO:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Log TypeDescriptionLocationLevel
Request LogContains the prompt requests to LLMs.logs/{task_name}/request.logInfo
Step LogContains the agent's response to the user's request and additional information at every step.logs/{task_name}/response.logInfo
Evaluation LogContains the evaluation results from the EvaluationAgent.logs/{task_name}/evaluation.logInfo
ScreenshotsContains the screenshots of the application UI.logs/{task_name}/-
+

All logs are stored in the logs/{task_name} directory.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/logs/request_logs/index.html b/logs/request_logs/index.html new file mode 100644 index 00000000..7d94aad8 --- /dev/null +++ b/logs/request_logs/index.html @@ -0,0 +1,341 @@ + + + + + + + + Request Logs - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Request Logs

+

The request is the prompt requests to the LLMs. The request log is stored in the request.log file. The request log contains the following information for each step:

+ + + + + + + + + + + + + + + + + +
FieldDescription
stepThe step number of the session.
promptThe prompt message sent to the LLMs.
+

The request log is stored at the debug level. You can configure the logging level in the LOG_LEVEL field in the config_dev.yaml file.

+
+

Tip

+

You can use the following python code to read the request log:

+
import json
+
+with open('logs/{task_name}/request.log', 'r') as f:
+    for line in f:
+        log = json.loads(line)
+
+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/logs/screenshots_logs/index.html b/logs/screenshots_logs/index.html new file mode 100644 index 00000000..7e9756a5 --- /dev/null +++ b/logs/screenshots_logs/index.html @@ -0,0 +1,361 @@ + + + + + + + + Screenshots - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Screenshot Logs

+

UFO also save desktop or application screenshots for debugging and evaluation purposes. The screenshot logs are stored in the logs/{task_name}/.

+

There are 4 types of screenshot logs generated by UFO, as detailed below.

+

Clean Screenshots

+

At each step, UFO saves a clean screenshot of the desktop or application. The clean screenshot is saved in the action_step{step_number}.png file. In addition, the clean screenshots are also saved when a sub-task, round or session is completed. The clean screenshots are saved in the action_round_{round_id}_sub_round_{sub_task_id}_final.png, action_round_{round_id}_final.png and action_step_final.png files, respectively. Below is an example of a clean screenshot.

+

+ AppAgent Image +

+ +

Annotation Screenshots

+

UFO also saves annotated screenshots of the application, with each control item is annotated with a number, following the Set-of-Mark paradigm. The annotated screenshots are saved in the action_step{step_number}_annotated.png file. Below is an example of an annotated screenshot.

+

+ AppAgent Image +

+ +
+

Info

+

Only selected types of controls are annotated in the screenshots. They are configured in the config_dev.yaml file under the CONTROL_LIST field.

+
+
+

Tip

+

Different types of controls are annotated with different colors. You can configure the colors in the config_dev.yaml file under the ANNOTATION_COLORS field.

+
+

Concatenated Screenshots

+

UFO also saves concatenated screenshots of the application, with clean and annotated screenshots concatenated side by side. The concatenated screenshots are saved in the action_step{step_number}_concat.png file. Below is an example of a concatenated screenshot.

+

+ AppAgent Image +

+ +
+

Info

+

You can configure whether to feed the concatenated screenshots to the LLMs, or separate clean and annotated screenshots, in the config_dev.yaml file under the CONCAT_SCREENSHOT field.

+
+

Selected Control Screenshots

+

UFO saves screenshots of the selected control item for operation. The selected control screenshots are saved in the action_step{step_number}_selected_controls.png file. Below is an example of a selected control screenshot.

+

+ AppAgent Image +

+ +
+

Info

+

You can configure whether to feed LLM with the selected control screenshots at the previous step to enhance the context, in the config_dev.yaml file under the INCLUDE_LAST_SCREENSHOT field.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/logs/step_logs/index.html b/logs/step_logs/index.html new file mode 100644 index 00000000..ec30ef01 --- /dev/null +++ b/logs/step_logs/index.html @@ -0,0 +1,674 @@ + + + + + + + + Step Logs - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Step Logs

+

The step log contains the agent's response to the user's request and additional information at every step. The step log is stored in the response.log file. The log fields are different for HostAgent and AppAgent. The step log is at the info level.

+

HostAgent Logs

+

The HostAgent logs contain the following fields:

+

LLM Output

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescriptionType
ObservationThe observation of current desktop screenshots.String
ThoughtThe logical reasoning process of the HostAgent.String
Current Sub-TaskThe current sub-task to be executed by the AppAgent.String
MessageThe message to be sent to the AppAgent for the completion of the sub-task.String
ControlLabelThe index of the selected application to execute the sub-task.String
ControlTextThe name of the selected application to execute the sub-task.String
PlanThe plan for the following sub-tasks after the current sub-task.List of Strings
StatusThe status of the agent, mapped to the AgentState.String
CommentAdditional comments or information provided to the user.String
QuestionsThe questions to be asked to the user for additional information.List of Strings
BashThe bash command to be executed by the HostAgent. It can be used to open applications or execute system commands.String
+

Additional Information

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescriptionType
StepThe step number of the session.Integer
RoundStepThe step number of the current round.Integer
AgentStepThe step number of the HostAgent.Integer
RoundThe round number of the session.Integer
ControlLabelThe index of the selected application to execute the sub-task.Integer
ControlTextThe name of the selected application to execute the sub-task.String
RequestThe user request.String
AgentThe agent that executed the step, set to HostAgent.String
AgentNameThe name of the agent.String
ApplicationThe application process name.String
CostThe cost of the step.Float
ResultsThe results of the step, set to an empty string.String
CleanScreenshotThe image path of the desktop screenshot.String
AnnotatedScreenshotThe image path of the annotated application screenshot.String
ConcatScreenshotThe image path of the concatenated application screenshot.String
SelectedControlScreenshotThe image path of the selected control screenshot.String
time_costThe time cost of each step in the process.Dictionary
+

AppAgent Logs

+

The AppAgent logs contain the following fields:

+

LLM Output

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescriptionType
ObservationThe observation of the current application screenshots.String
ThoughtThe logical reasoning process of the AppAgent.String
ControlLabelThe index of the selected control to interact with.String
ControlTextThe name of the selected control to interact with.String
FunctionThe function to be executed on the selected control.String
ArgsThe arguments required for the function execution.List of Strings
StatusThe status of the agent, mapped to the AgentState.String
PlanThe plan for the following steps after the current action.List of Strings
CommentAdditional comments or information provided to the user.String
SaveScreenshotThe flag to save the screenshot of the application to the blackboard for future reference.Boolean
+

Additional Information

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescriptionType
StepThe step number of the session.Integer
RoundStepThe step number of the current round.Integer
AgentStepThe step number of the AppAgent.Integer
RoundThe round number of the session.Integer
SubtaskThe sub-task to be executed by the AppAgent.String
SubtaskIndexThe index of the sub-task in the current round.Integer
ActionThe action to be executed by the AppAgent.String
ActionTypeThe type of the action to be executed.String
RequestThe user request.String
AgentThe agent that executed the step, set to AppAgent.String
AgentNameThe name of the agent.String
ApplicationThe application process name.String
CostThe cost of the step.Float
ResultsThe results of the step.String
CleanScreenshotThe image path of the desktop screenshot.String
AnnotatedScreenshotThe image path of the annotated application screenshot.String
ConcatScreenshotThe image path of the concatenated application screenshot.String
time_costThe time cost of each step in the process.Dictionary
+
+

Tip

+

You can use the following python code to read the request log:

+
import json
+
+with open('logs/{task_name}/request.log', 'r') as f:
+    for line in f:
+        log = json.loads(line)
+
+
+
+

Info

+

The FollowerAgent logs share the same fields as the AppAgent logs.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/logs/ui_tree_logs/index.html b/logs/ui_tree_logs/index.html new file mode 100644 index 00000000..abaf698c --- /dev/null +++ b/logs/ui_tree_logs/index.html @@ -0,0 +1,1196 @@ + + + + + + + + UI Tree - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

UI Tree Logs

+

UFO can save the entire UI tree of the application window at every step for data collection purposes. The UI tree can represent the application's UI structure, including the window, controls, and their properties. The UI tree logs are saved in the logs/{task_name}/ui_tree folder. You have to set the SAVE_UI_TREE flag to True in the config_dev.yaml file to enable the UI tree logs. Below is an example of the UI tree logs for application:

+
{
+    "id": "node_0",
+    "name": "Mail - Chaoyun Zhang - Outlook",
+    "control_type": "Window",
+    "rectangle": {
+        "left": 628,
+        "top": 258,
+        "right": 3508,
+        "bottom": 1795
+    },
+    "adjusted_rectangle": {
+        "left": 0,
+        "top": 0,
+        "right": 2880,
+        "bottom": 1537
+    },
+    "relative_rectangle": {
+        "left": 0.0,
+        "top": 0.0,
+        "right": 1.0,
+        "bottom": 1.0
+    },
+    "level": 0,
+    "children": [
+        {
+            "id": "node_1",
+            "name": "",
+            "control_type": "Pane",
+            "rectangle": {
+                "left": 3282,
+                "top": 258,
+                "right": 3498,
+                "bottom": 330
+            },
+            "adjusted_rectangle": {
+                "left": 2654,
+                "top": 0,
+                "right": 2870,
+                "bottom": 72
+            },
+            "relative_rectangle": {
+                "left": 0.9215277777777777,
+                "top": 0.0,
+                "right": 0.9965277777777778,
+                "bottom": 0.0468445022771633
+            },
+            "level": 1,
+            "children": []
+        }
+    ]
+}
+
+

Fields in the UI tree logs

+

Below is a table of the fields in the UI tree logs:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldDescriptionType
idThe unique identifier of the UI tree node.String
nameThe name of the UI tree node.String
control_typeThe type of the UI tree node.String
rectangleThe absolute position of the UI tree node.Dictionary
adjusted_rectangleThe adjusted position of the UI tree node.Dictionary
relative_rectangleThe relative position of the UI tree node.Dictionary
levelThe level of the UI tree node.Integer
childrenThe children of the UI tree node.List of UI tree nodes
+

Reference

+ + +
+ + + + +
+ + +

A class to represent the UI tree.

+ +

Initialize the UI tree with the root element.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + root + (UIAWrapper) + – +
    +

    The root element of the UI tree.

    +
    +
  • +
+
+ + + + + +
+ Source code in automator/ui_control/ui_tree.py +
20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
def __init__(self, root: UIAWrapper):
+    """
+    Initialize the UI tree with the root element.
+    :param root: The root element of the UI tree.
+    """
+    self.root = root
+
+    # The node counter to count the number of nodes in the UI tree.
+    self.node_counter = 0
+
+    try:
+        self._ui_tree = self._get_ui_tree(self.root)
+    except Exception as e:
+        self._ui_tree = {"error": traceback.format_exc()}
+
+
+ + + +
+ + + + + + + +
+ + + +

+ ui_tree: Dict[str, Any] + + + property + + +

+ + +
+ +

The UI tree.

+
+ +
+ + + +
+ + +

+ apply_ui_tree_diff(ui_tree_1, diff) + + + staticmethod + + +

+ + +
+ +

Apply a UI tree diff to ui_tree_1 to get ui_tree_2.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + ui_tree_1 + (Dict[str, Any]) + – +
    +

    The original UI tree.

    +
    +
  • +
  • + diff + (Dict[str, Any]) + – +
    +

    The diff to apply.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, Any] + – +
    +

    The new UI tree after applying the diff.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/ui_tree.py +
224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
+290
+291
+292
+293
+294
+295
+296
+297
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+317
+318
+319
+320
+321
+322
+323
@staticmethod
+def apply_ui_tree_diff(
+    ui_tree_1: Dict[str, Any], diff: Dict[str, Any]
+) -> Dict[str, Any]:
+    """
+    Apply a UI tree diff to ui_tree_1 to get ui_tree_2.
+    :param ui_tree_1: The original UI tree.
+    :param diff: The diff to apply.
+    :return: The new UI tree after applying the diff.
+    """
+
+    ui_tree_2 = copy.deepcopy(ui_tree_1)
+
+    # Build an ID map for quick node lookups
+    def build_id_map(node, id_map):
+        id_map[node["id"]] = node
+        for child in node.get("children", []):
+            build_id_map(child, id_map)
+
+    id_map = {}
+    if "id" in ui_tree_2:
+        build_id_map(ui_tree_2, id_map)
+
+    def remove_node_by_path(path):
+        # The path is a list of IDs from root to target node.
+        # The target node is the last element. Its parent is the second to last element.
+        if len(path) == 1:
+            # Removing the root
+            for k in list(ui_tree_2.keys()):
+                del ui_tree_2[k]
+            id_map.clear()
+            return
+
+        target_id = path[-1]
+        parent_id = path[-2]
+        parent_node = id_map[parent_id]
+        # Find and remove the child with target_id
+        for i, c in enumerate(parent_node.get("children", [])):
+            if c["id"] == target_id:
+                parent_node["children"].pop(i)
+                break
+
+        # Remove target_id from id_map
+        if target_id in id_map:
+            del id_map[target_id]
+
+    def add_node_by_path(path, node):
+        # Add the node at the specified path. The parent is path[-2], the node is path[-1].
+        # The path[-1] should be node["id"].
+        if len(path) == 1:
+            # Replacing the root node entirely
+            for k in list(ui_tree_2.keys()):
+                del ui_tree_2[k]
+            for k, v in node.items():
+                ui_tree_2[k] = v
+            # Rebuild id_map
+            id_map.clear()
+            if "id" in ui_tree_2:
+                build_id_map(ui_tree_2, id_map)
+            return
+
+        target_id = path[-1]
+        parent_id = path[-2]
+        parent_node = id_map[parent_id]
+        # Ensure children list exists
+        if "children" not in parent_node:
+            parent_node["children"] = []
+        # Insert or append the node
+        # We don't have a numeric index anymore, we just append, assuming order doesn't matter.
+        # If order matters, we must store ordering info or do some heuristic.
+        parent_node["children"].append(node)
+
+        # Update the id_map with the newly added subtree
+        build_id_map(node, id_map)
+
+    def modify_node_by_path(path, changes):
+        # Modify fields of the node at the given ID
+        target_id = path[-1]
+        node = id_map[target_id]
+        for field, (old_val, new_val) in changes.items():
+            node[field] = new_val
+
+    # Apply removals first
+    # Sort removals by length of path descending so we remove deeper nodes first.
+    # This ensures we don't remove parents before children.
+    for removal in sorted(
+        diff["removed"], key=lambda x: len(x["path"]), reverse=True
+    ):
+        remove_node_by_path(removal["path"])
+
+    # Apply additions
+    # Additions can be applied directly.
+    for addition in diff["added"]:
+        add_node_by_path(addition["path"], addition["node"])
+
+    # Apply modifications
+    for modification in diff["modified"]:
+        modify_node_by_path(modification["path"], modification["changes"])
+
+    return ui_tree_2
+
+
+
+ +
+ +
+ + +

+ flatten_ui_tree() + +

+ + +
+ +

Flatten the UI tree into a list in width-first order.

+ +
+ Source code in automator/ui_control/ui_tree.py +
117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
def flatten_ui_tree(self) -> List[Dict[str, Any]]:
+    """
+    Flatten the UI tree into a list in width-first order.
+    """
+
+    def flatten_tree(tree: Dict[str, Any], result: List[Dict[str, Any]]):
+        """
+        Flatten the tree.
+        :param tree: The tree to flatten.
+        :param result: The result list.
+        """
+
+        tree_info = {
+            "name": tree["name"],
+            "control_type": tree["control_type"],
+            "rectangle": tree["rectangle"],
+            "adjusted_rectangle": tree["adjusted_rectangle"],
+            "relative_rectangle": tree["relative_rectangle"],
+            "level": tree["level"],
+        }
+
+        result.append(tree_info)
+        for child in tree.get("children", []):
+            flatten_tree(child, result)
+
+    result = []
+    flatten_tree(self.ui_tree, result)
+    return result
+
+
+
+ +
+ +
+ + +

+ save_ui_tree_to_json(file_path) + +

+ + +
+ +

Save the UI tree to a JSON file.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + file_path + (str) + – +
    +

    The file path to save the UI tree.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/ui_tree.py +
103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
def save_ui_tree_to_json(self, file_path: str) -> None:
+    """
+    Save the UI tree to a JSON file.
+    :param file_path: The file path to save the UI tree.
+    """
+
+    # Check if the file directory exists. If not, create it.
+    save_dir = os.path.dirname(file_path)
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+
+    with open(file_path, "w") as file:
+        json.dump(self.ui_tree, file, indent=4)
+
+
+
+ +
+ +
+ + +

+ ui_tree_diff(ui_tree_1, ui_tree_2) + + + staticmethod + + +

+ + +
+ +

Compute the difference between two UI trees.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + ui_tree_1 + (Dict[str, Any]) + – +
    +

    The first UI tree.

    +
    +
  • +
  • + ui_tree_2 + (Dict[str, Any]) + – +
    +

    The second UI tree.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + – +
    +

    The difference between the two UI trees.

    +
    +
  • +
+
+
+ Source code in automator/ui_control/ui_tree.py +
146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+222
@staticmethod
+def ui_tree_diff(ui_tree_1: Dict[str, Any], ui_tree_2: Dict[str, Any]):
+    """
+    Compute the difference between two UI trees.
+    :param ui_tree_1: The first UI tree.
+    :param ui_tree_2: The second UI tree.
+    :return: The difference between the two UI trees.
+    """
+
+    diff = {"added": [], "removed": [], "modified": []}
+
+    def compare_nodes(node1, node2, path):
+        # Note: `path` is a list of IDs. The last element corresponds to the current node.
+        # If node1 doesn't exist and node2 does, it's an addition.
+        if node1 is None and node2 is not None:
+            diff["added"].append({"path": path, "node": copy.deepcopy(node2)})
+            return
+
+        # If node1 exists and node2 doesn't, it's a removal.
+        if node1 is not None and node2 is None:
+            diff["removed"].append({"path": path, "node": copy.deepcopy(node1)})
+            return
+
+        # If both don't exist, nothing to do.
+        if node1 is None and node2 is None:
+            return
+
+        # Both nodes exist, check for modifications at this node
+        fields_to_compare = [
+            "name",
+            "control_type",
+            "rectangle",
+            "adjusted_rectangle",
+            "relative_rectangle",
+            "level",
+        ]
+
+        changes = {}
+        for field in fields_to_compare:
+            if node1[field] != node2[field]:
+                changes[field] = (node1[field], node2[field])
+
+        if changes:
+            diff["modified"].append({"path": path, "changes": changes})
+
+        # Compare children
+        children1 = node1.get("children", [])
+        children2 = node2.get("children", [])
+
+        # We'll assume children order is stable. If not, differences will appear as adds/removes.
+        max_len = max(len(children1), len(children2))
+        for i in range(max_len):
+            c1 = children1[i] if i < len(children1) else None
+            c2 = children2[i] if i < len(children2) else None
+            # Use the child's id if available from c2 (prefer new tree), else from c1
+            if c2 is not None:
+                child_id = c2["id"]
+            elif c1 is not None:
+                child_id = c1["id"]
+            else:
+                # Both None shouldn't happen since max_len ensures one must exist
+                child_id = "unknown_child_id"
+
+            compare_nodes(c1, c2, path + [child_id])
+
+    # Initialize the path with the root node id if it exists
+    if ui_tree_2 and "id" in ui_tree_2:
+        root_id = ui_tree_2["id"]
+    elif ui_tree_1 and "id" in ui_tree_1:
+        root_id = ui_tree_1["id"]
+    else:
+        # If no root id is present, assume a placeholder
+        root_id = "root"
+
+    compare_nodes(ui_tree_1, ui_tree_2, [root_id])
+
+    return diff
+
+
+
+ +
+ + + +
+ +
+ +


+
+

Note

+

Save the UI tree logs may increase the latency of the system. It is recommended to set the SAVE_UI_TREE flag to False when you do not need the UI tree logs.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/modules/context/index.html b/modules/context/index.html new file mode 100644 index 00000000..bfe7f7c7 --- /dev/null +++ b/modules/context/index.html @@ -0,0 +1,1071 @@ + + + + + + + + Context - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Context

+

The Context object is a shared state object that stores the state of the conversation across all Rounds within a Session. It is used to maintain the context of the conversation, as well as the overall status of the conversation.

+

Context Attributes

+

The attributes of the Context object are defined in the ContextNames class, which is an Enum. The ContextNames class specifies various context attributes used throughout the session. Below is the definition:

+
class ContextNames(Enum):
+    """
+    The context names.
+    """
+
+    ID = "ID"  # The ID of the session
+    MODE = "MODE"  # The mode of the session
+    LOG_PATH = "LOG_PATH"  # The folder path to store the logs
+    REQUEST = "REQUEST"  # The current request
+    SUBTASK = "SUBTASK"  # The current subtask processed by the AppAgent
+    PREVIOUS_SUBTASKS = "PREVIOUS_SUBTASKS"  # The previous subtasks processed by the AppAgent
+    HOST_MESSAGE = "HOST_MESSAGE"  # The message from the HostAgent sent to the AppAgent
+    REQUEST_LOGGER = "REQUEST_LOGGER"  # The logger for the LLM request
+    LOGGER = "LOGGER"  # The logger for the session
+    EVALUATION_LOGGER = "EVALUATION_LOGGER"  # The logger for the evaluation
+    ROUND_STEP = "ROUND_STEP"  # The step of all rounds
+    SESSION_STEP = "SESSION_STEP"  # The step of the current session
+    CURRENT_ROUND_ID = "CURRENT_ROUND_ID"  # The ID of the current round
+    APPLICATION_WINDOW = "APPLICATION_WINDOW"  # The window of the application
+    APPLICATION_PROCESS_NAME = "APPLICATION_PROCESS_NAME"  # The process name of the application
+    APPLICATION_ROOT_NAME = "APPLICATION_ROOT_NAME"  # The root name of the application
+    CONTROL_REANNOTATION = "CONTROL_REANNOTATION"  # The re-annotation of the control provided by the AppAgent
+    SESSION_COST = "SESSION_COST"  # The cost of the session
+    ROUND_COST = "ROUND_COST"  # The cost of all rounds
+    ROUND_SUBTASK_AMOUNT = "ROUND_SUBTASK_AMOUNT"  # The amount of subtasks in all rounds
+    CURRENT_ROUND_STEP = "CURRENT_ROUND_STEP"  # The step of the current round
+    CURRENT_ROUND_COST = "CURRENT_ROUND_COST"  # The cost of the current round
+    CURRENT_ROUND_SUBTASK_AMOUNT = "CURRENT_ROUND_SUBTASK_AMOUNT"  # The amount of subtasks in the current round
+    STRUCTURAL_LOGS = "STRUCTURAL_LOGS"  # The structural logs of the session
+
+

Each attribute is a string that represents a specific aspect of the session context, ensuring that all necessary information is accessible and manageable within the application.

+

Attributes Description

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AttributeDescription
IDThe ID of the session.
MODEThe mode of the session.
LOG_PATHThe folder path to store the logs.
REQUESTThe current request.
SUBTASKThe current subtask processed by the AppAgent.
PREVIOUS_SUBTASKSThe previous subtasks processed by the AppAgent.
HOST_MESSAGEThe message from the HostAgent sent to the AppAgent.
REQUEST_LOGGERThe logger for the LLM request.
LOGGERThe logger for the session.
EVALUATION_LOGGERThe logger for the evaluation.
ROUND_STEPThe step of all rounds.
SESSION_STEPThe step of the current session.
CURRENT_ROUND_IDThe ID of the current round.
APPLICATION_WINDOWThe window of the application.
APPLICATION_PROCESS_NAMEThe process name of the application.
APPLICATION_ROOT_NAMEThe root name of the application.
CONTROL_REANNOTATIONThe re-annotation of the control provided by the AppAgent.
SESSION_COSTThe cost of the session.
ROUND_COSTThe cost of all rounds.
ROUND_SUBTASK_AMOUNTThe amount of subtasks in all rounds.
CURRENT_ROUND_STEPThe step of the current round.
CURRENT_ROUND_COSTThe cost of the current round.
CURRENT_ROUND_SUBTASK_AMOUNTThe amount of subtasks in the current round.
STRUCTURAL_LOGSThe structural logs of the session.
+

Reference for the Context object

+ + +
+ + + + +
+ + +

The context class that maintains the context for the session and agent.

+ + + + + + + + + +
+ + + + + + + +
+ + + +

+ current_round_cost: Optional[float] + + + property + writable + + +

+ + +
+ +

Get the current round cost.

+
+ +
+ +
+ + + +

+ current_round_step: int + + + property + writable + + +

+ + +
+ +

Get the current round step.

+
+ +
+ +
+ + + +

+ current_round_subtask_amount: int + + + property + writable + + +

+ + +
+ +

Get the current round subtask index.

+
+ +
+ + + +
+ + +

+ add_to_structural_logs(data) + +

+ + +
+ +

Add data to the structural logs.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + data + (Dict[str, Any]) + – +
    +

    The data to add to the structural logs.

    +
    +
  • +
+
+
+ Source code in module/context.py +
274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
def add_to_structural_logs(self, data: Dict[str, Any]) -> None:
+    """
+    Add data to the structural logs.
+    :param data: The data to add to the structural logs.
+    """
+
+    round_key = data.get("Round", None)
+    subtask_key = data.get("SubtaskIndex", None)
+
+    if round_key is None or subtask_key is None:
+        return
+
+    remaining_items = {key: data[key] for key in data if key not in ["a", "b"]}
+    self._context[ContextNames.STRUCTURAL_LOGS.name][round_key][subtask_key].append(
+        remaining_items
+    )
+
+
+
+ +
+ +
+ + +

+ filter_structural_logs(round_key, subtask_key, keys) + +

+ + +
+ +

Filter the structural logs.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + round_key + (int) + – +
    +

    The round key.

    +
    +
  • +
  • + subtask_key + (int) + – +
    +

    The subtask key.

    +
    +
  • +
  • + keys + (Union[str, List[str]]) + – +
    +

    The keys to filter.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Union[List[Any], List[Dict[str, Any]]] + – +
    +

    The filtered structural logs.

    +
    +
  • +
+
+
+ Source code in module/context.py +
291
+292
+293
+294
+295
+296
+297
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
def filter_structural_logs(
+    self, round_key: int, subtask_key: int, keys: Union[str, List[str]]
+) -> Union[List[Any], List[Dict[str, Any]]]:
+    """
+    Filter the structural logs.
+    :param round_key: The round key.
+    :param subtask_key: The subtask key.
+    :param keys: The keys to filter.
+    :return: The filtered structural logs.
+    """
+
+    structural_logs = self._context[ContextNames.STRUCTURAL_LOGS.name][round_key][
+        subtask_key
+    ]
+
+    if isinstance(keys, str):
+        return [log[keys] for log in structural_logs]
+    elif isinstance(keys, list):
+        return [{key: log[key] for key in keys} for log in structural_logs]
+    else:
+        raise TypeError(f"Keys should be a string or a list of strings.")
+
+
+
+ +
+ +
+ + +

+ get(key) + +

+ + +
+ +

Get the value from the context.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + key + (ContextNames) + – +
    +

    The context name.

    +
    +
  • +
+
+ + + + + + + + + + + + +
Returns: +
    +
  • + Any + – +
    +

    The value from the context.

    +
    +
  • +
+
+
+ Source code in module/context.py +
165
+166
+167
+168
+169
+170
+171
+172
+173
def get(self, key: ContextNames) -> Any:
+    """
+    Get the value from the context.
+    :param key: The context name.
+    :return: The value from the context.
+    """
+    # Sync the current round step and cost
+    self._sync_round_values()
+    return self._context.get(key.name)
+
+
+
+ +
+ +
+ + +

+ set(key, value) + +

+ + +
+ +

Set the value in the context.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + key + (ContextNames) + – +
    +

    The context name.

    +
    +
  • +
  • + value + (Any) + – +
    +

    The value to set in the context.

    +
    +
  • +
+
+
+ Source code in module/context.py +
175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
def set(self, key: ContextNames, value: Any) -> None:
+    """
+    Set the value in the context.
+    :param key: The context name.
+    :param value: The value to set in the context.
+    """
+    if key.name in self._context:
+        self._context[key.name] = value
+        # Sync the current round step and cost
+        if key == ContextNames.CURRENT_ROUND_STEP:
+            self.current_round_step = value
+        if key == ContextNames.CURRENT_ROUND_COST:
+            self.current_round_cost = value
+        if key == ContextNames.CURRENT_ROUND_SUBTASK_AMOUNT:
+            self.current_round_subtask_amount = value
+    else:
+        raise KeyError(f"Key '{key}' is not a valid context name.")
+
+
+
+ +
+ +
+ + +

+ to_dict() + +

+ + +
+ +

Convert the context to a dictionary.

+ + + + + + + + + + + + + +
Returns: +
    +
  • + Dict[str, Any] + – +
    +

    The dictionary of the context.

    +
    +
  • +
+
+
+ Source code in module/context.py +
313
+314
+315
+316
+317
+318
def to_dict(self) -> Dict[str, Any]:
+    """
+    Convert the context to a dictionary.
+    :return: The dictionary of the context.
+    """
+    return self._context
+
+
+
+ +
+ +
+ + +

+ update_dict(key, value) + +

+ + +
+ +

Add a dictionary to a context key. The value and the context key should be dictionaries.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + key + (ContextNames) + – +
    +

    The context key to update.

    +
    +
  • +
  • + value + (Dict[str, Any]) + – +
    +

    The dictionary to add to the context key.

    +
    +
  • +
+
+
+ Source code in module/context.py +
203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
def update_dict(self, key: ContextNames, value: Dict[str, Any]) -> None:
+    """
+    Add a dictionary to a context key. The value and the context key should be dictionaries.
+    :param key: The context key to update.
+    :param value: The dictionary to add to the context key.
+    """
+    if key.name in self._context:
+        context_value = self._context[key.name]
+        if isinstance(value, dict) and isinstance(context_value, dict):
+            self._context[key.name].update(value)
+        else:
+            raise TypeError(
+                f"Value for key '{key.name}' is {key.value}, requires a dictionary."
+            )
+    else:
+        raise KeyError(f"Key '{key.name}' is not a valid context name.")
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/modules/round/index.html b/modules/round/index.html new file mode 100644 index 00000000..483ecbd0 --- /dev/null +++ b/modules/round/index.html @@ -0,0 +1,1104 @@ + + + + + + + + Round - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Round

+

A Round is a single interaction between the user and UFO that processes a single user request. A Round is responsible for orchestrating the HostAgent and AppAgent to fulfill the user's request.

+

Round Lifecycle

+

In a Round, the following steps are executed:

+

1. Round Initialization

+

At the beginning of a Round, the Round object is created, and the user's request is processed by the HostAgent to determine the appropriate application to fulfill the request.

+

2. Action Execution

+

Once created, the Round orchestrates the HostAgent and AppAgent to execute the necessary actions to fulfill the user's request. The core logic of a Round is shown below:

+
def run(self) -> None:
+    """
+    Run the round.
+    """
+
+    while not self.is_finished():
+
+        self.agent.handle(self.context)
+
+        self.state = self.agent.state.next_state(self.agent)
+        self.agent = self.agent.state.next_agent(self.agent)
+        self.agent.set_state(self.state)
+
+        # If the subtask ends, capture the last snapshot of the application.
+        if self.state.is_subtask_end():
+            time.sleep(configs["SLEEP_TIME"])
+            self.capture_last_snapshot(sub_round_id=self.subtask_amount)
+            self.subtask_amount += 1
+
+    self.agent.blackboard.add_requests(
+        {"request_{i}".format(i=self.id), self.request}
+    )
+
+    if self.application_window is not None:
+        self.capture_last_snapshot()
+
+    if self._should_evaluate:
+        self.evaluation()
+
+

At each step, the Round processes the user's request by invoking the handle method of the AppAgent or HostAgent based on the current state. The state determines the next agent to handle the request and the next state to transition to.

+

3. Request Completion

+

The AppAgent completes the actions within the application. If the request spans multiple applications, the HostAgent may switch to a different application to continue the task.

+

4. Round Termination

+

Once the user's request is fulfilled, the Round is terminated, and the results are returned to the user. If configured, the EvaluationAgent evaluates the completeness of the Round.

+

Reference

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

A round of a session in UFO. +A round manages a single user request and consists of multiple steps. +A session may consists of multiple rounds of interactions.

+ +

Initialize a round.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + request + (str) + – +
    +

    The request of the round.

    +
    +
  • +
  • + agent + (BasicAgent) + – +
    +

    The initial agent of the round.

    +
    +
  • +
  • + context + (Context) + – +
    +

    The shared context of the round.

    +
    +
  • +
  • + should_evaluate + (bool) + – +
    +

    Whether to evaluate the round.

    +
    +
  • +
  • + id + (int) + – +
    +

    The id of the round.

    +
    +
  • +
+
+ + + + + +
+ Source code in module/basic.py +
48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
def __init__(
+    self,
+    request: str,
+    agent: BasicAgent,
+    context: Context,
+    should_evaluate: bool,
+    id: int,
+) -> None:
+    """
+    Initialize a round.
+    :param request: The request of the round.
+    :param agent: The initial agent of the round.
+    :param context: The shared context of the round.
+    :param should_evaluate: Whether to evaluate the round.
+    :param id: The id of the round.
+    """
+
+    self._request = request
+    self._context = context
+    self._agent = agent
+    self._state = agent.state
+    self._id = id
+    self._should_evaluate = should_evaluate
+
+    self._init_context()
+
+
+ + + +
+ + + + + + + +
+ + + +

+ agent: BasicAgent + + + property + writable + + +

+ + +
+ +

Get the agent of the round. +return: The agent of the round.

+
+ +
+ +
+ + + +

+ application_window: UIAWrapper + + + property + writable + + +

+ + +
+ +

Get the application of the session. +return: The application of the session.

+
+ +
+ +
+ + + +

+ context: Context + + + property + + +

+ + +
+ +

Get the context of the round. +return: The context of the round.

+
+ +
+ +
+ + + +

+ cost: float + + + property + + +

+ + +
+ +

Get the cost of the round. +return: The cost of the round.

+
+ +
+ +
+ + + +

+ id: int + + + property + + +

+ + +
+ +

Get the id of the round. +return: The id of the round.

+
+ +
+ +
+ + + +

+ log_path: str + + + property + + +

+ + +
+ +

Get the log path of the round.

+

return: The log path of the round.

+
+ +
+ +
+ + + +

+ request: str + + + property + + +

+ + +
+ +

Get the request of the round. +return: The request of the round.

+
+ +
+ +
+ + + +

+ state: AgentState + + + property + writable + + +

+ + +
+ +

Get the status of the round. +return: The status of the round.

+
+ +
+ +
+ + + +

+ step: int + + + property + + +

+ + +
+ +

Get the local step of the round. +return: The step of the round.

+
+ +
+ +
+ + + +

+ subtask_amount: int + + + property + writable + + +

+ + +
+ +

Get the subtask amount of the round. +return: The subtask amount of the round.

+
+ +
+ + + +
+ + +

+ capture_last_snapshot(sub_round_id=None) + +

+ + +
+ +

Capture the last snapshot of the application, including the screenshot and the XML file if configured.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + sub_round_id + (Optional[int], default: + None +) + – +
    +

    The id of the sub-round, default is None.

    +
    +
  • +
+
+
+ Source code in module/basic.py +
246
+247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
+290
+291
+292
+293
+294
+295
+296
+297
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
def capture_last_snapshot(self, sub_round_id: Optional[int] = None) -> None:
+    """
+    Capture the last snapshot of the application, including the screenshot and the XML file if configured.
+    :param sub_round_id: The id of the sub-round, default is None.
+    """
+
+    # Capture the final screenshot
+    if sub_round_id is None:
+        screenshot_save_path = self.log_path + f"action_round_{self.id}_final.png"
+    else:
+        screenshot_save_path = (
+            self.log_path
+            + f"action_round_{self.id}_sub_round_{sub_round_id}_final.png"
+        )
+
+    if self.application_window is not None:
+
+        try:
+            PhotographerFacade().capture_app_window_screenshot(
+                self.application_window, save_path=screenshot_save_path
+            )
+
+        except Exception as e:
+            utils.print_with_color(
+                f"Warning: The last snapshot capture failed, due to the error: {e}",
+                "yellow",
+            )
+
+        if configs.get("SAVE_UI_TREE", False):
+            step_ui_tree = ui_tree.UITree(self.application_window)
+
+            ui_tree_path = os.path.join(self.log_path, "ui_trees")
+
+            ui_tree_file_name = (
+                f"ui_tree_round_{self.id}_final.json"
+                if sub_round_id is None
+                else f"ui_tree_round_{self.id}_sub_round_{sub_round_id}_final.json"
+            )
+
+            step_ui_tree.save_ui_tree_to_json(
+                os.path.join(
+                    ui_tree_path,
+                    ui_tree_file_name,
+                )
+            )
+
+        # Save the final XML file
+        if configs["LOG_XML"]:
+            log_abs_path = os.path.abspath(self.log_path)
+            xml_save_path = os.path.join(
+                log_abs_path,
+                (
+                    f"xml/action_round_{self.id}_final.xml"
+                    if sub_round_id is None
+                    else f"xml/action_round_{self.id}_sub_round_{sub_round_id}_final.xml"
+                ),
+            )
+
+            if issubclass(type(self.agent), HostAgent):
+
+                app_agent: AppAgent = self.agent.get_active_appagent()
+                app_agent.Puppeteer.save_to_xml(xml_save_path)
+            elif issubclass(type(self.agent), AppAgent):
+                app_agent: AppAgent = self.agent
+                app_agent.Puppeteer.save_to_xml(xml_save_path)
+
+
+
+ +
+ +
+ + +

+ evaluation() + +

+ + +
+ +

TODO: Evaluate the round.

+ +
+ Source code in module/basic.py +
312
+313
+314
+315
+316
def evaluation(self) -> None:
+    """
+    TODO: Evaluate the round.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ is_finished() + +

+ + +
+ +

Check if the round is finished. +return: True if the round is finished, otherwise False.

+ +
+ Source code in module/basic.py +
127
+128
+129
+130
+131
+132
+133
+134
+135
def is_finished(self) -> bool:
+    """
+    Check if the round is finished.
+    return: True if the round is finished, otherwise False.
+    """
+    return (
+        self.state.is_round_end()
+        or self.context.get(ContextNames.SESSION_STEP) >= configs["MAX_STEP"]
+    )
+
+
+
+ +
+ +
+ + +

+ print_cost() + +

+ + +
+ +

Print the total cost of the round.

+ +
+ Source code in module/basic.py +
225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
def print_cost(self) -> None:
+    """
+    Print the total cost of the round.
+    """
+
+    total_cost = self.cost
+    if isinstance(total_cost, float):
+        formatted_cost = "${:.2f}".format(total_cost)
+        utils.print_with_color(
+            f"Request total cost for current round is {formatted_cost}", "yellow"
+        )
+
+
+
+ +
+ +
+ + +

+ run() + +

+ + +
+ +

Run the round.

+ +
+ Source code in module/basic.py +
 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
def run(self) -> None:
+    """
+    Run the round.
+    """
+
+    while not self.is_finished():
+
+        self.agent.handle(self.context)
+
+        self.state = self.agent.state.next_state(self.agent)
+        self.agent = self.agent.state.next_agent(self.agent)
+        self.agent.set_state(self.state)
+
+        # If the subtask ends, capture the last snapshot of the application.
+        if self.state.is_subtask_end():
+            time.sleep(configs["SLEEP_TIME"])
+            self.capture_last_snapshot(sub_round_id=self.subtask_amount)
+            self.subtask_amount += 1
+
+    self.agent.blackboard.add_requests(
+        {"request_{i}".format(i=self.id), self.request}
+    )
+
+    if self.application_window is not None:
+        self.capture_last_snapshot()
+
+    if self._should_evaluate:
+        self.evaluation()
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/modules/session/index.html b/modules/session/index.html new file mode 100644 index 00000000..50012ca0 --- /dev/null +++ b/modules/session/index.html @@ -0,0 +1,1536 @@ + + + + + + + + Session - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Session

+

A Session is a conversation instance between the user and UFO. It is a continuous interaction that starts when the user initiates a request and ends when the request is completed. UFO supports multiple requests within the same session. Each request is processed sequentially, by a Round of interaction, until the user's request is fulfilled. We show the relationship between Session and Round in the following figure:

+

+ Session and Round Image +

+ +

Session Lifecycle

+

The lifecycle of a Session is as follows:

+

1. Session Initialization

+

A Session is initialized when the user starts a conversation with UFO. The Session object is created, and the first Round of interaction is initiated. At this stage, the user's request is processed by the HostAgent to determine the appropriate application to fulfill the request. The Context object is created to store the state of the conversation shared across all Rounds within the Session.

+

2. Session Processing

+

Once the Session is initialized, the Round of interaction begins, which completes a single user request by orchestrating the HostAgent and AppAgent.

+

3. Next Round

+

After the completion of the first Round, the Session requests the next request from the user to start the next Round of interaction. This process continues until there are no more requests from the user. +The core logic of a Session is shown below:

+
def run(self) -> None:
+    """
+    Run the session.
+    """
+
+    while not self.is_finished():
+
+        round = self.create_new_round()
+        if round is None:
+            break
+        round.run()
+
+    if self.application_window is not None:
+        self.capture_last_snapshot()
+
+    if self._should_evaluate and not self.is_error():
+        self.evaluation()
+
+    self.print_cost()
+
+

4. Session Termination

+

If the user has no more requests or decides to end the conversation, the Session is terminated, and the conversation ends. The EvaluationAgent evaluates the completeness of the Session if it is configured to do so.

+

Reference

+ + +
+ + + + +
+

+ Bases: ABC

+ + +

A basic session in UFO. A session consists of multiple rounds of interactions and conversations.

+ +

Initialize a session.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + task + (str) + – +
    +

    The name of current task.

    +
    +
  • +
  • + should_evaluate + (bool) + – +
    +

    Whether to evaluate the session.

    +
    +
  • +
  • + id + (int) + – +
    +

    The id of the session.

    +
    +
  • +
+
+ + + + + +
+ Source code in module/basic.py +
340
+341
+342
+343
+344
+345
+346
+347
+348
+349
+350
+351
+352
+353
+354
+355
+356
+357
+358
+359
+360
+361
+362
+363
+364
+365
+366
+367
+368
def __init__(self, task: str, should_evaluate: bool, id: int) -> None:
+    """
+    Initialize a session.
+    :param task: The name of current task.
+    :param should_evaluate: Whether to evaluate the session.
+    :param id: The id of the session.
+    """
+
+    self._should_evaluate = should_evaluate
+    self._id = id
+
+    # Logging-related properties
+    self.log_path = f"logs/{task}/"
+    utils.create_folder(self.log_path)
+
+    self._rounds: Dict[int, BaseRound] = {}
+
+    self._context = Context()
+    self._init_context()
+    self._finish = False
+
+    self._host_agent: HostAgent = AgentFactory.create_agent(
+        "host",
+        "HostAgent",
+        configs["HOST_AGENT"]["VISUAL_MODE"],
+        configs["HOSTAGENT_PROMPT"],
+        configs["HOSTAGENT_EXAMPLE_PROMPT"],
+        configs["API_PROMPT"],
+    )
+
+
+ + + +
+ + + + + + + +
+ + + +

+ application_window: UIAWrapper + + + property + writable + + +

+ + +
+ +

Get the application of the session. +return: The application of the session.

+
+ +
+ +
+ + + +

+ context: Context + + + property + + +

+ + +
+ +

Get the context of the session. +return: The context of the session.

+
+ +
+ +
+ + + +

+ cost: float + + + property + writable + + +

+ + +
+ +

Get the cost of the session. +return: The cost of the session.

+
+ +
+ +
+ + + +

+ current_round: BaseRound + + + property + + +

+ + +
+ +

Get the current round of the session. +return: The current round of the session.

+
+ +
+ +
+ + + +

+ evaluation_logger: logging.Logger + + + property + + +

+ + +
+ +

Get the logger for evaluation. +return: The logger for evaluation.

+
+ +
+ +
+ + + +

+ id: int + + + property + + +

+ + +
+ +

Get the id of the session. +return: The id of the session.

+
+ +
+ +
+ + + +

+ rounds: Dict[int, BaseRound] + + + property + + +

+ + +
+ +

Get the rounds of the session. +return: The rounds of the session.

+
+ +
+ +
+ + + +

+ session_type: str + + + property + + +

+ + +
+ +

Get the class name of the session. +return: The class name of the session.

+
+ +
+ +
+ + + +

+ step: int + + + property + + +

+ + +
+ +

Get the step of the session. +return: The step of the session.

+
+ +
+ +
+ + + +

+ total_rounds: int + + + property + + +

+ + +
+ +

Get the total number of rounds in the session. +return: The total number of rounds in the session.

+
+ +
+ + + +
+ + +

+ add_round(id, round) + +

+ + +
+ +

Add a round to the session.

+ + + + + + + + + + + + + +
Parameters: +
    +
  • + id + (int) + – +
    +

    The id of the round.

    +
    +
  • +
  • + round + (BaseRound) + – +
    +

    The round to be added.

    +
    +
  • +
+
+
+ Source code in module/basic.py +
412
+413
+414
+415
+416
+417
+418
def add_round(self, id: int, round: BaseRound) -> None:
+    """
+    Add a round to the session.
+    :param id: The id of the round.
+    :param round: The round to be added.
+    """
+    self._rounds[id] = round
+
+
+
+ +
+ +
+ + +

+ capture_last_snapshot() + +

+ + +
+ +

Capture the last snapshot of the application, including the screenshot and the XML file if configured.

+ +
+ Source code in module/basic.py +
660
+661
+662
+663
+664
+665
+666
+667
+668
+669
+670
+671
+672
+673
+674
+675
+676
+677
+678
+679
+680
+681
+682
+683
+684
+685
+686
+687
+688
+689
+690
+691
+692
+693
+694
+695
+696
+697
+698
+699
+700
+701
+702
def capture_last_snapshot(self) -> None:
+    """
+    Capture the last snapshot of the application, including the screenshot and the XML file if configured.
+    """
+
+    # Capture the final screenshot
+    screenshot_save_path = self.log_path + f"action_step_final.png"
+
+    if self.application_window is not None:
+
+        try:
+            PhotographerFacade().capture_app_window_screenshot(
+                self.application_window, save_path=screenshot_save_path
+            )
+
+        except Exception as e:
+            utils.print_with_color(
+                f"Warning: The last snapshot capture failed, due to the error: {e}",
+                "yellow",
+            )
+
+        if configs.get("SAVE_UI_TREE", False):
+            step_ui_tree = ui_tree.UITree(self.application_window)
+
+            ui_tree_path = os.path.join(self.log_path, "ui_trees")
+
+            ui_tree_file_name = "ui_tree_final.json"
+
+            step_ui_tree.save_ui_tree_to_json(
+                os.path.join(
+                    ui_tree_path,
+                    ui_tree_file_name,
+                )
+            )
+
+        # Save the final XML file
+        if configs["LOG_XML"]:
+            log_abs_path = os.path.abspath(self.log_path)
+            xml_save_path = os.path.join(log_abs_path, f"xml/action_step_final.xml")
+
+            app_agent = self._host_agent.get_active_appagent()
+            if app_agent is not None:
+                app_agent.Puppeteer.save_to_xml(xml_save_path)
+
+
+
+ +
+ +
+ + +

+ create_following_round() + +

+ + +
+ +

Create a following round. +return: The following round.

+ +
+ Source code in module/basic.py +
405
+406
+407
+408
+409
+410
def create_following_round(self) -> BaseRound:
+    """
+    Create a following round.
+    return: The following round.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ create_new_round() + + + abstractmethod + + +

+ + +
+ +

Create a new round.

+ +
+ Source code in module/basic.py +
390
+391
+392
+393
+394
+395
@abstractmethod
+def create_new_round(self) -> Optional[BaseRound]:
+    """
+    Create a new round.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ evaluation() + +

+ + +
+ +

Evaluate the session.

+ +
+ Source code in module/basic.py +
612
+613
+614
+615
+616
+617
+618
+619
+620
+621
+622
+623
+624
+625
+626
+627
+628
+629
+630
+631
+632
+633
+634
+635
+636
+637
+638
+639
+640
+641
+642
+643
+644
+645
+646
+647
+648
+649
+650
def evaluation(self) -> None:
+    """
+    Evaluate the session.
+    """
+    utils.print_with_color("Evaluating the session...", "yellow")
+    evaluator = EvaluationAgent(
+        name="eva_agent",
+        app_root_name=self.context.get(ContextNames.APPLICATION_ROOT_NAME),
+        is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
+        main_prompt=configs["EVALUATION_PROMPT"],
+        example_prompt="",
+        api_prompt=configs["API_PROMPT"],
+    )
+
+    requests = self.request_to_evaluate()
+
+    # Evaluate the session, first use the default setting, if failed, then disable the screenshot evaluation.
+    try:
+        result, cost = evaluator.evaluate(
+            request=requests,
+            log_path=self.log_path,
+            eva_all_screenshots=configs.get("EVA_ALL_SCREENSHOTS", True),
+        )
+    except Exception as e:
+        result, cost = evaluator.evaluate(
+            request=requests,
+            log_path=self.log_path,
+            eva_all_screenshots=False,
+        )
+
+    # Add additional information to the evaluation result.
+    additional_info = {"level": "session", "request": requests, "id": 0}
+    result.update(additional_info)
+
+    self.cost += cost
+
+    evaluator.print_response(result)
+
+    self.evaluation_logger.info(json.dumps(result))
+
+
+
+ +
+ +
+ + +

+ experience_saver() + +

+ + +
+ +

Save the current trajectory as agent experience.

+ +
+ Source code in module/basic.py +
534
+535
+536
+537
+538
+539
+540
+541
+542
+543
+544
+545
+546
+547
+548
+549
+550
+551
+552
+553
+554
+555
+556
+557
+558
+559
+560
+561
def experience_saver(self) -> None:
+    """
+    Save the current trajectory as agent experience.
+    """
+    utils.print_with_color(
+        "Summarizing and saving the execution flow as experience...", "yellow"
+    )
+
+    summarizer = ExperienceSummarizer(
+        configs["APP_AGENT"]["VISUAL_MODE"],
+        configs["EXPERIENCE_PROMPT"],
+        configs["APPAGENT_EXAMPLE_PROMPT"],
+        configs["API_PROMPT"],
+    )
+    experience = summarizer.read_logs(self.log_path)
+    summaries, cost = summarizer.get_summary_list(experience)
+
+    experience_path = configs["EXPERIENCE_SAVED_PATH"]
+    utils.create_folder(experience_path)
+    summarizer.create_or_update_yaml(
+        summaries, os.path.join(experience_path, "experience.yaml")
+    )
+    summarizer.create_or_update_vector_db(
+        summaries, os.path.join(experience_path, "experience_db")
+    )
+
+    self.cost += cost
+    utils.print_with_color("The experience has been saved.", "magenta")
+
+
+
+ +
+ +
+ + +

+ initialize_logger(log_path, log_filename, mode='a', configs=configs) + + + staticmethod + + +

+ + +
+ +

Initialize logging. +log_path: The path of the log file. +log_filename: The name of the log file. +return: The logger.

+ +
+ Source code in module/basic.py +
704
+705
+706
+707
+708
+709
+710
+711
+712
+713
+714
+715
+716
+717
+718
+719
+720
+721
+722
+723
+724
+725
+726
@staticmethod
+def initialize_logger(log_path: str, log_filename: str, mode='a', configs = configs) -> logging.Logger:
+    """
+    Initialize logging.
+    log_path: The path of the log file.
+    log_filename: The name of the log file.
+    return: The logger.
+    """
+    # Code for initializing logging
+    logger = logging.Logger(log_filename)
+
+    if not configs["PRINT_LOG"]:
+        # Remove existing handlers if PRINT_LOG is False
+        logger.handlers = []
+
+    log_file_path = os.path.join(log_path, log_filename)
+    file_handler = logging.FileHandler(log_file_path, mode = mode, encoding="utf-8")
+    formatter = logging.Formatter("%(message)s")
+    file_handler.setFormatter(formatter)
+    logger.addHandler(file_handler)
+    logger.setLevel(configs["LOG_LEVEL"])
+
+    return logger
+
+
+
+ +
+ +
+ + +

+ is_error() + +

+ + +
+ +

Check if the session is in error state. +return: True if the session is in error state, otherwise False.

+ +
+ Source code in module/basic.py +
582
+583
+584
+585
+586
+587
+588
+589
def is_error(self):
+    """
+    Check if the session is in error state.
+    return: True if the session is in error state, otherwise False.
+    """
+    if self.current_round is not None:
+        return self.current_round.state.name() == AgentStatus.ERROR.value
+    return False
+
+
+
+ +
+ +
+ + +

+ is_finished() + +

+ + +
+ +

Check if the session is ended. +return: True if the session is ended, otherwise False.

+ +
+ Source code in module/basic.py +
591
+592
+593
+594
+595
+596
+597
+598
+599
+600
+601
+602
def is_finished(self) -> bool:
+    """
+    Check if the session is ended.
+    return: True if the session is ended, otherwise False.
+    """
+    if self._finish or self.step >= configs["MAX_STEP"]:
+        return True
+
+    if self.is_error():
+        return True
+
+    return False
+
+
+
+ +
+ +
+ + +

+ next_request() + + + abstractmethod + + +

+ + +
+ +

Get the next request of the session. +return: The request of the session.

+ +
+ Source code in module/basic.py +
397
+398
+399
+400
+401
+402
+403
@abstractmethod
+def next_request(self) -> str:
+    """
+    Get the next request of the session.
+    return: The request of the session.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ print_cost() + +

+ + +
+ +

Print the total cost of the session.

+ +
+ Source code in module/basic.py +
563
+564
+565
+566
+567
+568
+569
+570
+571
+572
+573
+574
+575
+576
+577
+578
+579
+580
def print_cost(self) -> None:
+    """
+    Print the total cost of the session.
+    """
+
+    if isinstance(self.cost, float) and self.cost > 0:
+        formatted_cost = "${:.2f}".format(self.cost)
+        utils.print_with_color(
+            f"Total request cost of the session: {formatted_cost}$", "yellow"
+        )
+    else:
+        utils.print_with_color(
+            "Cost is not available for the model {host_model} or {app_model}.".format(
+                host_model=configs["HOST_AGENT"]["API_MODEL"],
+                app_model=configs["APP_AGENT"]["API_MODEL"],
+            ),
+            "yellow",
+        )
+
+
+
+ +
+ +
+ + +

+ request_to_evaluate() + + + abstractmethod + + +

+ + +
+ +

Get the request to evaluate. +return: The request(s) to evaluate.

+ +
+ Source code in module/basic.py +
604
+605
+606
+607
+608
+609
+610
@abstractmethod
+def request_to_evaluate(self) -> str:
+    """
+    Get the request to evaluate.
+    return: The request(s) to evaluate.
+    """
+    pass
+
+
+
+ +
+ +
+ + +

+ run() + +

+ + +
+ +

Run the session.

+ +
+ Source code in module/basic.py +
370
+371
+372
+373
+374
+375
+376
+377
+378
+379
+380
+381
+382
+383
+384
+385
+386
+387
+388
def run(self) -> None:
+    """
+    Run the session.
+    """
+
+    while not self.is_finished():
+
+        round = self.create_new_round()
+        if round is None:
+            break
+        round.run()
+
+    if self.application_window is not None:
+        self.capture_last_snapshot()
+
+    if self._should_evaluate and not self.is_error():
+        self.evaluation()
+
+    self.print_cost()
+
+
+
+ +
+ + + +
+ +
+ +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/objects.inv b/objects.inv new file mode 100644 index 00000000..2a06be6c Binary files /dev/null and b/objects.inv differ diff --git a/project_directory_structure/index.html b/project_directory_structure/index.html new file mode 100644 index 00000000..f3903422 --- /dev/null +++ b/project_directory_structure/index.html @@ -0,0 +1,466 @@ + + + + + + + + Project Directory Structure - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + +
  • +
  • +
+
+
+
+
+ +

The UFO project is organized into a well-defined directory structure to facilitate development, deployment, and documentation. Below is an overview of each directory and file, along with their purpose:

+
📦project
+ ┣ 📂documents               # Folder to store project documentation
+ ┣ 📂learner                 # Folder to build the vector database for help documents
+ ┣ 📂model_worker            # Folder to store tools for deploying your own model
+ ┣ 📂record_processor        # Folder to parse human demonstrations from Windows Step Recorder and build the vector database
+ ┣ 📂vetordb                 # Folder to store all data in the vector database for RAG (Retrieval-Augmented Generation)
+ ┣ 📂logs                    # Folder to store logs, generated after the program starts
+ ┗ 📂ufo                     # Directory containing main project code
+    ┣ 📂module               # Directory for the basic module of UFO, e.g., session and round
+    ┣ 📂agents               # Code implementation of agents in UFO
+    ┣ 📂automator            # Implementation of the skill set of agents to automate applications
+    ┣ 📂experience           # Parse and save the agent's self-experience
+    ┣ 📂llm                  # Folder to store the LLM (Large Language Model) implementation
+    ┣ 📂prompter             # Prompt constructor for the agent
+    ┣ 📂prompts              # Prompt templates and files to construct the full prompt
+    ┣ 📂rag                  # Implementation of RAG from different sources to enhance agents' abilities
+    ┣ 📂utils                # Utility functions
+    ┣ 📂config               # Configuration files
+        ┣ 📜config.yaml      # User configuration file for LLM and other settings
+        ┣ 📜config_dev.yaml  # Configuration file for developers
+        ┗ ...
+    ┗ 📄ufo.py               # Main entry point for the UFO client
+
+

Directory and File Descriptions

+

documents

+
    +
  • Purpose: Stores all the project documentation.
  • +
  • Details: This may include design documents, user manuals, API documentation, and any other relevant project documentation.
  • +
+

learner

+
    +
  • Purpose: Used to build the vector database for help documents.
  • +
  • Details: This directory contains scripts and tools to process help documents and create a searchable vector database, enhancing the agents' ability for task completion.
  • +
+

model_worker

+
    +
  • Purpose: Contains tools and scripts necessary for deploying custom models.
  • +
  • Details: This includes model deployment configurations, and management tools for integrating custom models into the project.
  • +
+

record_processor

+
    +
  • Purpose: Parses human demonstrations recorded using the Windows Step Recorder and builds the vector database.
  • +
  • Details: This directory includes parsers, data processing scripts, and tools to convert human demonstrations into a format suitable for agent's retrieval.
  • +
+

vetordb

+
    +
  • Purpose: Stores all data within the vector database for Retrieval-Augmented Generation (RAG).
  • +
  • Details: This directory is essential for maintaining the data that enhances the agents' ability to retrieve relevant information and generate more accurate responses.
  • +
+

logs

+
    +
  • Purpose: Stores log files generated by the application.
  • +
  • Details: This directory helps in monitoring, debugging, and analyzing the application's performance and behavior. Logs are generated dynamically as the application runs.
  • +
+

ufo

+
    +
  • Purpose: The core directory containing the main project code.
  • +
  • +

    Details: This directory is further subdivided into multiple subdirectories, each serving a specific purpose within the project.

    +

    module

    +
      +
    • Purpose: Contains the basic modules of the UFO project, such as session management and rounds.
    • +
    • Details: This includes foundational classes and functions that are used throughout the project.
    • +
    +

    agents

    +
      +
    • Purpose: Houses the code implementations of various agents in the UFO project.
    • +
    • Details: Agents are components that perform specific tasks within the system, and this directory contains their logic, components, and behavior.
    • +
    +

    automator

    +
      +
    • Purpose: Implements the skill set of agents to automate applications.
    • +
    • Details: This includes scripts and tools that enable agents to interact with and automate tasks in various applications, such as mouse and keyboard actions and API calls.
    • +
    +

    experience

    +
      +
    • Purpose: Parses and saves the agent's self-experience.
    • +
    • Details: This directory contains mechanisms for agents to learn from their actions and outcomes, improving their performance over time.
    • +
    +

    llm

    +
      +
    • Purpose: Stores the implementation of the Large Language Model (LLM).
    • +
    • Details: This includes the implementation of APIs for different language models, such as GPT, Genimi, QWEN, etc., that are used by the agents.
    • +
    +

    prompter

    +
      +
    • Purpose: Constructs prompts for the agents.
    • +
    • Details: This directory includes prompt construction logic and tools that help agents generate meaningful prompts for user interactions.
    • +
    +

    prompts

    +
      +
    • Purpose: Contains prompt templates and files used to construct the full prompt.
    • +
    • Details: This includes predefined prompt structures and content that are used to create meaningful interactions with the agents.
    • +
    +

    rag

    +
      +
    • Purpose: Implements Retrieval-Augmented Generation (RAG) from different sources to enhance the agents' abilities.
    • +
    • etails: This directory includes scripts and tools for integrating various data sources into the RAG framework, improving the accuracy and relevance of the agents' outputs.
    • +
    +

    utils

    +
      +
    • Purpose: Contains utility functions.
    • +
    • Details: This directory includes helper functions, common utilities, and other reusable code snippets that support the project's operations.
    • +
    +

    config

    +
      +
    • Purpose: Stores configuration files.
    • +
    • Details: This directory includes different configuration files for various environments and purposes.
    • +
    • config.yaml: User configuration file for LLM and other settings. You need to rename config.yaml.template to config.yaml and edit the configuration settings as needed.
    • +
    • config_dev.yaml: Developer-specific configuration file with settings tailored for development purposes.
    • +
    +

    ufo.py

    +
      +
    • Purpose: Main entry point for the UFO client.
    • +
    • Details: This script initializes and starts the UFO application.
    • +
    +
  • +
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/prompts/api_prompts/index.html b/prompts/api_prompts/index.html new file mode 100644 index 00000000..dfd43c3c --- /dev/null +++ b/prompts/api_prompts/index.html @@ -0,0 +1,357 @@ + + + + + + + + API Prompts - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

API Prompts

+

The API prompts provide the description and usage of the APIs used in UFO. Shared APIs and app-specific APIs are stored in different directories:

+ + + + + + + + + + + + + + + + + +
DirectoryDescription
ufo/prompts/share/base/api.yamlShared APIs used by multiple applications
ufo/prompts/{app_name}APIs specific to an application
+
+

Info

+

You can configure the API prompt used in the config.yaml file. You can find more information about the configuration file here.

+
+
+

Tip

+

You may customize the API prompt for a specific application by adding the API prompt in the application's directory.

+
+

Example API Prompt

+

Below is an example of an API prompt:

+
click_input:
+  summary: |-
+    "click_input" is to click the control item with mouse.
+  class_name: |-
+    ClickInputCommand
+  usage: |-
+    [1] API call: click_input(button: str, double: bool)
+    [2] Args:
+      - button: 'The mouse button to click. One of ''left'', ''right'', ''middle'' or ''x'' (Default: ''left'')'
+      - double: 'Whether to perform a double click or not (Default: False)'
+    [3] Example: click_input(button="left", double=False)
+    [4] Available control item: All control items.
+    [5] Return: None
+
+

To create a new API prompt, follow the template above and add it to the appropriate directory.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/prompts/basic_template/index.html b/prompts/basic_template/index.html new file mode 100644 index 00000000..2c261cd7 --- /dev/null +++ b/prompts/basic_template/index.html @@ -0,0 +1,363 @@ + + + + + + + + Basic Prompts - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Basic Prompt Template

+

The basic prompt template is a fixed format that is used to generate prompts for the HostAgent, AppAgent, FollowerAgent, and EvaluationAgent. It include the template for the system and user roles to construct the agent's prompt.

+

Below is the default file path for the basic prompt template:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AgentFile PathVersion
HostAgentufo/prompts/share/base/host_agent.yamlbase
HostAgentufo/prompts/share/lite/host_agent.yamllite
AppAgentufo/prompts/share/base/app_agent.yamlbase
AppAgentufo/prompts/share/lite/app_agent.yamllite
FollowerAgentufo/prompts/share/base/app_agent.yamlbase
FollowerAgentufo/prompts/share/lite/app_agent.yamllite
EvaluationAgentufo/prompts/evaluation/evaluation_agent.yaml-
+
+

Info

+

You can configure the prompt template used in the config.yaml file. You can find more information about the configuration file here.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/prompts/examples_prompts/index.html b/prompts/examples_prompts/index.html new file mode 100644 index 00000000..cadda8ff --- /dev/null +++ b/prompts/examples_prompts/index.html @@ -0,0 +1,406 @@ + + + + + + + + Examples Prompts - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Example Prompts

+

The example prompts are used to generate textual demonstration examples for in-context learning. The examples are stored in the ufo/prompts/examples directory, with the following subdirectories:

+ + + + + + + + + + + + + + + + + + + + + +
DirectoryDescription
liteLite version of demonstration examples
non-visualExamples for non-visual LLMs
visualExamples for visual LLMs
+
+

Info

+
+

You can configure the example prompt used in the config.yaml file. You can find more information about the configuration file here.

+

Example Prompts

+

Below are examples for the HostAgent and AppAgent:

+
    +
  • HostAgent:
  • +
+
Request: |-
+    Summarize and add all to do items on Microsoft To Do from the meeting notes email, and write a summary on the meeting_notes.docx.
+Response:
+    Observation: |-
+        The current screenshot shows the Microsoft To Do application is visible, and outlook application and the meeting_notes.docx are available in the list of applications.
+    Thought: |-
+        The user request can be decomposed into three sub-tasks: (1) Summarize all to do items on Microsoft To Do from the meeting_notes email, (2) Add all to do items to Microsoft To Do, and (3) Write a summary on the meeting_notes.docx. I need to open the Microsoft To Do application to complete the first two sub-tasks.
+        Each sub-task will be completed in individual applications sequentially.
+    CurrentSubtask: |-
+        Summarized all to do items from the meeting notes email in Outlook.
+    Message:
+        - (1) You need to first search for the meeting notes email in Outlook to summarize.
+        - (2) Only summarize the to do items from the meeting notes email, without any redundant information.
+    ControlLabel: |-
+        16
+    ControlText: |-
+        Mail - Outlook - Jim
+    Status: |-
+        CONTINUE
+    Plan:
+        - Add all to do items previously summarized from the meeting notes email to one-by-one Microsoft To Do.
+        - Write a summary about the meeting notes email on the meeting_notes.docx.
+    Comment: |-
+        I plan to first summarize all to do items from the meeting notes email in Outlook.
+    Questions: []
+
+
    +
  • AppAgent:
  • +
+
Request: |-
+    How many stars does the Imdiffusion repo have?
+Sub-task: |-
+    Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually.
+Response: 
+    Observation: |-
+      I observe that the Edge browser is visible in the screenshot, with the Google search page opened.
+    Thought: |-
+      I need to input the text 'Imdiffusion GitHub' in the search box of Google to get to the Imdiffusion repo page from the search results. The search box is usually in a type of ComboBox.
+    ControlLabel: |-
+      36
+    ControlText: |-
+      搜索
+    Function: |-
+      set_edit_text
+    Args: 
+      {"text": "Imdiffusion GitHub"}
+    Status: |-
+      CONTINUE
+    Plan:
+      - (1) After input 'Imdiffusion GitHub', click Google Search to search for the Imdiffusion repo on github.
+      - (2) Once the searched results are visible, click the Imdiffusion repo Hyperlink in the searched results to open the repo page.
+      - (3) Observing and summarize the number of stars the Imdiffusion repo page, and reply to the user request.
+    Comment: |-
+      I plan to use Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually.
+    SaveScreenshot:
+      {"save": false, "reason": ""}
+Tips: |-
+    - The search box is usually in a type of ComboBox.
+    - The number of stars of a Github repo page can be found in the repo page visually.
+
+

These examples regulate the output format of the agent's response and provide a structured way to generate demonstration examples for in-context learning.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/prompts/overview/index.html b/prompts/overview/index.html new file mode 100644 index 00000000..cb17cfc7 --- /dev/null +++ b/prompts/overview/index.html @@ -0,0 +1,370 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Prompts

+

All prompts used in UFO are stored in the ufo/prompts directory. The folder structure is as follows:

+
📦prompts
+ ┣ 📂apps                # Stores API prompts for specific applications
+   ┣ 📂excel            # Stores API prompts for Excel
+   ┣ 📂word             # Stores API prompts for Word
+   ┗ ...
+ ┣ 📂demonstration       # Stores prompts for summarizing demonstrations from humans using Step Recorder
+ ┣ 📂experience          # Stores prompts for summarizing the agent's self-experience
+ ┣ 📂evaluation          # Stores prompts for the EvaluationAgent
+ ┣ 📂examples            # Stores demonstration examples for in-context learning
+   ┣ 📂lite             # Lite version of demonstration examples
+   ┣ 📂non-visual       # Examples for non-visual LLMs
+   ┗ 📂visual           # Examples for visual LLMs
+ ┗ 📂share               # Stores shared prompts
+   ┣ 📂lite             # Lite version of shared prompts
+   ┗ 📂base             # Basic version of shared prompts
+     ┣ 📜api.yaml       # Basic API prompt
+     ┣ 📜app_agent.yaml # Basic AppAgent prompt template
+     ┗ 📜host_agent.yaml # Basic HostAgent prompt template
+
+
+

Note

+

The lite version of prompts is a simplified version of the full prompts, which is used for LLMs that have a limited token budget. However, the lite version is not fully optimized and may lead to suboptimal performance.

+
+
+

Note

+

The non-visual and visual folders contain examples for non-visual and visual LLMs, respectively.

+
+

Agent Prompts

+

Prompts used an agent usually contain the following information:

+ + + + + + + + + + + + + + + + + + + + + +
PromptDescription
Basic templateA basic template for the agent prompt.
APIA prompt for all skills and APIs used by the agent.
ExamplesDemonstration examples for the agent for in-context learning.
+

You can find these prompts share directory. The prompts for specific applications are stored in the apps directory.

+
+

Tip

+

All information is constructed using the agent's Prompter class. You can find more details about the Prompter class in the documentation here.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/search.html b/search.html new file mode 100644 index 00000000..4c2b388c --- /dev/null +++ b/search.html @@ -0,0 +1,303 @@ + + + + + + + + UFO Documentation + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • +
  • +
  • +
+
+
+
+
+ + +

Search Results

+ + + +
+ Searching... +
+ + +
+
+ +
+
+ +
+ +
+ +
+ + + + + +
+ + + + + + + + + diff --git a/search/lunr.js b/search/lunr.js new file mode 100644 index 00000000..aca0a167 --- /dev/null +++ b/search/lunr.js @@ -0,0 +1,3475 @@ +/** + * lunr - http://lunrjs.com - A bit like Solr, but much smaller and not as bright - 2.3.9 + * Copyright (C) 2020 Oliver Nightingale + * @license MIT + */ + +;(function(){ + +/** + * A convenience function for configuring and constructing + * a new lunr Index. + * + * A lunr.Builder instance is created and the pipeline setup + * with a trimmer, stop word filter and stemmer. + * + * This builder object is yielded to the configuration function + * that is passed as a parameter, allowing the list of fields + * and other builder parameters to be customised. + * + * All documents _must_ be added within the passed config function. + * + * @example + * var idx = lunr(function () { + * this.field('title') + * this.field('body') + * this.ref('id') + * + * documents.forEach(function (doc) { + * this.add(doc) + * }, this) + * }) + * + * @see {@link lunr.Builder} + * @see {@link lunr.Pipeline} + * @see {@link lunr.trimmer} + * @see {@link lunr.stopWordFilter} + * @see {@link lunr.stemmer} + * @namespace {function} lunr + */ +var lunr = function (config) { + var builder = new lunr.Builder + + builder.pipeline.add( + lunr.trimmer, + lunr.stopWordFilter, + lunr.stemmer + ) + + builder.searchPipeline.add( + lunr.stemmer + ) + + config.call(builder, builder) + return builder.build() +} + +lunr.version = "2.3.9" +/*! + * lunr.utils + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * A namespace containing utils for the rest of the lunr library + * @namespace lunr.utils + */ +lunr.utils = {} + +/** + * Print a warning message to the console. + * + * @param {String} message The message to be printed. + * @memberOf lunr.utils + * @function + */ +lunr.utils.warn = (function (global) { + /* eslint-disable no-console */ + return function (message) { + if (global.console && console.warn) { + console.warn(message) + } + } + /* eslint-enable no-console */ +})(this) + +/** + * Convert an object to a string. + * + * In the case of `null` and `undefined` the function returns + * the empty string, in all other cases the result of calling + * `toString` on the passed object is returned. + * + * @param {Any} obj The object to convert to a string. + * @return {String} string representation of the passed object. + * @memberOf lunr.utils + */ +lunr.utils.asString = function (obj) { + if (obj === void 0 || obj === null) { + return "" + } else { + return obj.toString() + } +} + +/** + * Clones an object. + * + * Will create a copy of an existing object such that any mutations + * on the copy cannot affect the original. + * + * Only shallow objects are supported, passing a nested object to this + * function will cause a TypeError. + * + * Objects with primitives, and arrays of primitives are supported. + * + * @param {Object} obj The object to clone. + * @return {Object} a clone of the passed object. + * @throws {TypeError} when a nested object is passed. + * @memberOf Utils + */ +lunr.utils.clone = function (obj) { + if (obj === null || obj === undefined) { + return obj + } + + var clone = Object.create(null), + keys = Object.keys(obj) + + for (var i = 0; i < keys.length; i++) { + var key = keys[i], + val = obj[key] + + if (Array.isArray(val)) { + clone[key] = val.slice() + continue + } + + if (typeof val === 'string' || + typeof val === 'number' || + typeof val === 'boolean') { + clone[key] = val + continue + } + + throw new TypeError("clone is not deep and does not support nested objects") + } + + return clone +} +lunr.FieldRef = function (docRef, fieldName, stringValue) { + this.docRef = docRef + this.fieldName = fieldName + this._stringValue = stringValue +} + +lunr.FieldRef.joiner = "/" + +lunr.FieldRef.fromString = function (s) { + var n = s.indexOf(lunr.FieldRef.joiner) + + if (n === -1) { + throw "malformed field ref string" + } + + var fieldRef = s.slice(0, n), + docRef = s.slice(n + 1) + + return new lunr.FieldRef (docRef, fieldRef, s) +} + +lunr.FieldRef.prototype.toString = function () { + if (this._stringValue == undefined) { + this._stringValue = this.fieldName + lunr.FieldRef.joiner + this.docRef + } + + return this._stringValue +} +/*! + * lunr.Set + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * A lunr set. + * + * @constructor + */ +lunr.Set = function (elements) { + this.elements = Object.create(null) + + if (elements) { + this.length = elements.length + + for (var i = 0; i < this.length; i++) { + this.elements[elements[i]] = true + } + } else { + this.length = 0 + } +} + +/** + * A complete set that contains all elements. + * + * @static + * @readonly + * @type {lunr.Set} + */ +lunr.Set.complete = { + intersect: function (other) { + return other + }, + + union: function () { + return this + }, + + contains: function () { + return true + } +} + +/** + * An empty set that contains no elements. + * + * @static + * @readonly + * @type {lunr.Set} + */ +lunr.Set.empty = { + intersect: function () { + return this + }, + + union: function (other) { + return other + }, + + contains: function () { + return false + } +} + +/** + * Returns true if this set contains the specified object. + * + * @param {object} object - Object whose presence in this set is to be tested. + * @returns {boolean} - True if this set contains the specified object. + */ +lunr.Set.prototype.contains = function (object) { + return !!this.elements[object] +} + +/** + * Returns a new set containing only the elements that are present in both + * this set and the specified set. + * + * @param {lunr.Set} other - set to intersect with this set. + * @returns {lunr.Set} a new set that is the intersection of this and the specified set. + */ + +lunr.Set.prototype.intersect = function (other) { + var a, b, elements, intersection = [] + + if (other === lunr.Set.complete) { + return this + } + + if (other === lunr.Set.empty) { + return other + } + + if (this.length < other.length) { + a = this + b = other + } else { + a = other + b = this + } + + elements = Object.keys(a.elements) + + for (var i = 0; i < elements.length; i++) { + var element = elements[i] + if (element in b.elements) { + intersection.push(element) + } + } + + return new lunr.Set (intersection) +} + +/** + * Returns a new set combining the elements of this and the specified set. + * + * @param {lunr.Set} other - set to union with this set. + * @return {lunr.Set} a new set that is the union of this and the specified set. + */ + +lunr.Set.prototype.union = function (other) { + if (other === lunr.Set.complete) { + return lunr.Set.complete + } + + if (other === lunr.Set.empty) { + return this + } + + return new lunr.Set(Object.keys(this.elements).concat(Object.keys(other.elements))) +} +/** + * A function to calculate the inverse document frequency for + * a posting. This is shared between the builder and the index + * + * @private + * @param {object} posting - The posting for a given term + * @param {number} documentCount - The total number of documents. + */ +lunr.idf = function (posting, documentCount) { + var documentsWithTerm = 0 + + for (var fieldName in posting) { + if (fieldName == '_index') continue // Ignore the term index, its not a field + documentsWithTerm += Object.keys(posting[fieldName]).length + } + + var x = (documentCount - documentsWithTerm + 0.5) / (documentsWithTerm + 0.5) + + return Math.log(1 + Math.abs(x)) +} + +/** + * A token wraps a string representation of a token + * as it is passed through the text processing pipeline. + * + * @constructor + * @param {string} [str=''] - The string token being wrapped. + * @param {object} [metadata={}] - Metadata associated with this token. + */ +lunr.Token = function (str, metadata) { + this.str = str || "" + this.metadata = metadata || {} +} + +/** + * Returns the token string that is being wrapped by this object. + * + * @returns {string} + */ +lunr.Token.prototype.toString = function () { + return this.str +} + +/** + * A token update function is used when updating or optionally + * when cloning a token. + * + * @callback lunr.Token~updateFunction + * @param {string} str - The string representation of the token. + * @param {Object} metadata - All metadata associated with this token. + */ + +/** + * Applies the given function to the wrapped string token. + * + * @example + * token.update(function (str, metadata) { + * return str.toUpperCase() + * }) + * + * @param {lunr.Token~updateFunction} fn - A function to apply to the token string. + * @returns {lunr.Token} + */ +lunr.Token.prototype.update = function (fn) { + this.str = fn(this.str, this.metadata) + return this +} + +/** + * Creates a clone of this token. Optionally a function can be + * applied to the cloned token. + * + * @param {lunr.Token~updateFunction} [fn] - An optional function to apply to the cloned token. + * @returns {lunr.Token} + */ +lunr.Token.prototype.clone = function (fn) { + fn = fn || function (s) { return s } + return new lunr.Token (fn(this.str, this.metadata), this.metadata) +} +/*! + * lunr.tokenizer + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * A function for splitting a string into tokens ready to be inserted into + * the search index. Uses `lunr.tokenizer.separator` to split strings, change + * the value of this property to change how strings are split into tokens. + * + * This tokenizer will convert its parameter to a string by calling `toString` and + * then will split this string on the character in `lunr.tokenizer.separator`. + * Arrays will have their elements converted to strings and wrapped in a lunr.Token. + * + * Optional metadata can be passed to the tokenizer, this metadata will be cloned and + * added as metadata to every token that is created from the object to be tokenized. + * + * @static + * @param {?(string|object|object[])} obj - The object to convert into tokens + * @param {?object} metadata - Optional metadata to associate with every token + * @returns {lunr.Token[]} + * @see {@link lunr.Pipeline} + */ +lunr.tokenizer = function (obj, metadata) { + if (obj == null || obj == undefined) { + return [] + } + + if (Array.isArray(obj)) { + return obj.map(function (t) { + return new lunr.Token( + lunr.utils.asString(t).toLowerCase(), + lunr.utils.clone(metadata) + ) + }) + } + + var str = obj.toString().toLowerCase(), + len = str.length, + tokens = [] + + for (var sliceEnd = 0, sliceStart = 0; sliceEnd <= len; sliceEnd++) { + var char = str.charAt(sliceEnd), + sliceLength = sliceEnd - sliceStart + + if ((char.match(lunr.tokenizer.separator) || sliceEnd == len)) { + + if (sliceLength > 0) { + var tokenMetadata = lunr.utils.clone(metadata) || {} + tokenMetadata["position"] = [sliceStart, sliceLength] + tokenMetadata["index"] = tokens.length + + tokens.push( + new lunr.Token ( + str.slice(sliceStart, sliceEnd), + tokenMetadata + ) + ) + } + + sliceStart = sliceEnd + 1 + } + + } + + return tokens +} + +/** + * The separator used to split a string into tokens. Override this property to change the behaviour of + * `lunr.tokenizer` behaviour when tokenizing strings. By default this splits on whitespace and hyphens. + * + * @static + * @see lunr.tokenizer + */ +lunr.tokenizer.separator = /[\s\-]+/ +/*! + * lunr.Pipeline + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * lunr.Pipelines maintain an ordered list of functions to be applied to all + * tokens in documents entering the search index and queries being ran against + * the index. + * + * An instance of lunr.Index created with the lunr shortcut will contain a + * pipeline with a stop word filter and an English language stemmer. Extra + * functions can be added before or after either of these functions or these + * default functions can be removed. + * + * When run the pipeline will call each function in turn, passing a token, the + * index of that token in the original list of all tokens and finally a list of + * all the original tokens. + * + * The output of functions in the pipeline will be passed to the next function + * in the pipeline. To exclude a token from entering the index the function + * should return undefined, the rest of the pipeline will not be called with + * this token. + * + * For serialisation of pipelines to work, all functions used in an instance of + * a pipeline should be registered with lunr.Pipeline. Registered functions can + * then be loaded. If trying to load a serialised pipeline that uses functions + * that are not registered an error will be thrown. + * + * If not planning on serialising the pipeline then registering pipeline functions + * is not necessary. + * + * @constructor + */ +lunr.Pipeline = function () { + this._stack = [] +} + +lunr.Pipeline.registeredFunctions = Object.create(null) + +/** + * A pipeline function maps lunr.Token to lunr.Token. A lunr.Token contains the token + * string as well as all known metadata. A pipeline function can mutate the token string + * or mutate (or add) metadata for a given token. + * + * A pipeline function can indicate that the passed token should be discarded by returning + * null, undefined or an empty string. This token will not be passed to any downstream pipeline + * functions and will not be added to the index. + * + * Multiple tokens can be returned by returning an array of tokens. Each token will be passed + * to any downstream pipeline functions and all will returned tokens will be added to the index. + * + * Any number of pipeline functions may be chained together using a lunr.Pipeline. + * + * @interface lunr.PipelineFunction + * @param {lunr.Token} token - A token from the document being processed. + * @param {number} i - The index of this token in the complete list of tokens for this document/field. + * @param {lunr.Token[]} tokens - All tokens for this document/field. + * @returns {(?lunr.Token|lunr.Token[])} + */ + +/** + * Register a function with the pipeline. + * + * Functions that are used in the pipeline should be registered if the pipeline + * needs to be serialised, or a serialised pipeline needs to be loaded. + * + * Registering a function does not add it to a pipeline, functions must still be + * added to instances of the pipeline for them to be used when running a pipeline. + * + * @param {lunr.PipelineFunction} fn - The function to check for. + * @param {String} label - The label to register this function with + */ +lunr.Pipeline.registerFunction = function (fn, label) { + if (label in this.registeredFunctions) { + lunr.utils.warn('Overwriting existing registered function: ' + label) + } + + fn.label = label + lunr.Pipeline.registeredFunctions[fn.label] = fn +} + +/** + * Warns if the function is not registered as a Pipeline function. + * + * @param {lunr.PipelineFunction} fn - The function to check for. + * @private + */ +lunr.Pipeline.warnIfFunctionNotRegistered = function (fn) { + var isRegistered = fn.label && (fn.label in this.registeredFunctions) + + if (!isRegistered) { + lunr.utils.warn('Function is not registered with pipeline. This may cause problems when serialising the index.\n', fn) + } +} + +/** + * Loads a previously serialised pipeline. + * + * All functions to be loaded must already be registered with lunr.Pipeline. + * If any function from the serialised data has not been registered then an + * error will be thrown. + * + * @param {Object} serialised - The serialised pipeline to load. + * @returns {lunr.Pipeline} + */ +lunr.Pipeline.load = function (serialised) { + var pipeline = new lunr.Pipeline + + serialised.forEach(function (fnName) { + var fn = lunr.Pipeline.registeredFunctions[fnName] + + if (fn) { + pipeline.add(fn) + } else { + throw new Error('Cannot load unregistered function: ' + fnName) + } + }) + + return pipeline +} + +/** + * Adds new functions to the end of the pipeline. + * + * Logs a warning if the function has not been registered. + * + * @param {lunr.PipelineFunction[]} functions - Any number of functions to add to the pipeline. + */ +lunr.Pipeline.prototype.add = function () { + var fns = Array.prototype.slice.call(arguments) + + fns.forEach(function (fn) { + lunr.Pipeline.warnIfFunctionNotRegistered(fn) + this._stack.push(fn) + }, this) +} + +/** + * Adds a single function after a function that already exists in the + * pipeline. + * + * Logs a warning if the function has not been registered. + * + * @param {lunr.PipelineFunction} existingFn - A function that already exists in the pipeline. + * @param {lunr.PipelineFunction} newFn - The new function to add to the pipeline. + */ +lunr.Pipeline.prototype.after = function (existingFn, newFn) { + lunr.Pipeline.warnIfFunctionNotRegistered(newFn) + + var pos = this._stack.indexOf(existingFn) + if (pos == -1) { + throw new Error('Cannot find existingFn') + } + + pos = pos + 1 + this._stack.splice(pos, 0, newFn) +} + +/** + * Adds a single function before a function that already exists in the + * pipeline. + * + * Logs a warning if the function has not been registered. + * + * @param {lunr.PipelineFunction} existingFn - A function that already exists in the pipeline. + * @param {lunr.PipelineFunction} newFn - The new function to add to the pipeline. + */ +lunr.Pipeline.prototype.before = function (existingFn, newFn) { + lunr.Pipeline.warnIfFunctionNotRegistered(newFn) + + var pos = this._stack.indexOf(existingFn) + if (pos == -1) { + throw new Error('Cannot find existingFn') + } + + this._stack.splice(pos, 0, newFn) +} + +/** + * Removes a function from the pipeline. + * + * @param {lunr.PipelineFunction} fn The function to remove from the pipeline. + */ +lunr.Pipeline.prototype.remove = function (fn) { + var pos = this._stack.indexOf(fn) + if (pos == -1) { + return + } + + this._stack.splice(pos, 1) +} + +/** + * Runs the current list of functions that make up the pipeline against the + * passed tokens. + * + * @param {Array} tokens The tokens to run through the pipeline. + * @returns {Array} + */ +lunr.Pipeline.prototype.run = function (tokens) { + var stackLength = this._stack.length + + for (var i = 0; i < stackLength; i++) { + var fn = this._stack[i] + var memo = [] + + for (var j = 0; j < tokens.length; j++) { + var result = fn(tokens[j], j, tokens) + + if (result === null || result === void 0 || result === '') continue + + if (Array.isArray(result)) { + for (var k = 0; k < result.length; k++) { + memo.push(result[k]) + } + } else { + memo.push(result) + } + } + + tokens = memo + } + + return tokens +} + +/** + * Convenience method for passing a string through a pipeline and getting + * strings out. This method takes care of wrapping the passed string in a + * token and mapping the resulting tokens back to strings. + * + * @param {string} str - The string to pass through the pipeline. + * @param {?object} metadata - Optional metadata to associate with the token + * passed to the pipeline. + * @returns {string[]} + */ +lunr.Pipeline.prototype.runString = function (str, metadata) { + var token = new lunr.Token (str, metadata) + + return this.run([token]).map(function (t) { + return t.toString() + }) +} + +/** + * Resets the pipeline by removing any existing processors. + * + */ +lunr.Pipeline.prototype.reset = function () { + this._stack = [] +} + +/** + * Returns a representation of the pipeline ready for serialisation. + * + * Logs a warning if the function has not been registered. + * + * @returns {Array} + */ +lunr.Pipeline.prototype.toJSON = function () { + return this._stack.map(function (fn) { + lunr.Pipeline.warnIfFunctionNotRegistered(fn) + + return fn.label + }) +} +/*! + * lunr.Vector + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * A vector is used to construct the vector space of documents and queries. These + * vectors support operations to determine the similarity between two documents or + * a document and a query. + * + * Normally no parameters are required for initializing a vector, but in the case of + * loading a previously dumped vector the raw elements can be provided to the constructor. + * + * For performance reasons vectors are implemented with a flat array, where an elements + * index is immediately followed by its value. E.g. [index, value, index, value]. This + * allows the underlying array to be as sparse as possible and still offer decent + * performance when being used for vector calculations. + * + * @constructor + * @param {Number[]} [elements] - The flat list of element index and element value pairs. + */ +lunr.Vector = function (elements) { + this._magnitude = 0 + this.elements = elements || [] +} + + +/** + * Calculates the position within the vector to insert a given index. + * + * This is used internally by insert and upsert. If there are duplicate indexes then + * the position is returned as if the value for that index were to be updated, but it + * is the callers responsibility to check whether there is a duplicate at that index + * + * @param {Number} insertIdx - The index at which the element should be inserted. + * @returns {Number} + */ +lunr.Vector.prototype.positionForIndex = function (index) { + // For an empty vector the tuple can be inserted at the beginning + if (this.elements.length == 0) { + return 0 + } + + var start = 0, + end = this.elements.length / 2, + sliceLength = end - start, + pivotPoint = Math.floor(sliceLength / 2), + pivotIndex = this.elements[pivotPoint * 2] + + while (sliceLength > 1) { + if (pivotIndex < index) { + start = pivotPoint + } + + if (pivotIndex > index) { + end = pivotPoint + } + + if (pivotIndex == index) { + break + } + + sliceLength = end - start + pivotPoint = start + Math.floor(sliceLength / 2) + pivotIndex = this.elements[pivotPoint * 2] + } + + if (pivotIndex == index) { + return pivotPoint * 2 + } + + if (pivotIndex > index) { + return pivotPoint * 2 + } + + if (pivotIndex < index) { + return (pivotPoint + 1) * 2 + } +} + +/** + * Inserts an element at an index within the vector. + * + * Does not allow duplicates, will throw an error if there is already an entry + * for this index. + * + * @param {Number} insertIdx - The index at which the element should be inserted. + * @param {Number} val - The value to be inserted into the vector. + */ +lunr.Vector.prototype.insert = function (insertIdx, val) { + this.upsert(insertIdx, val, function () { + throw "duplicate index" + }) +} + +/** + * Inserts or updates an existing index within the vector. + * + * @param {Number} insertIdx - The index at which the element should be inserted. + * @param {Number} val - The value to be inserted into the vector. + * @param {function} fn - A function that is called for updates, the existing value and the + * requested value are passed as arguments + */ +lunr.Vector.prototype.upsert = function (insertIdx, val, fn) { + this._magnitude = 0 + var position = this.positionForIndex(insertIdx) + + if (this.elements[position] == insertIdx) { + this.elements[position + 1] = fn(this.elements[position + 1], val) + } else { + this.elements.splice(position, 0, insertIdx, val) + } +} + +/** + * Calculates the magnitude of this vector. + * + * @returns {Number} + */ +lunr.Vector.prototype.magnitude = function () { + if (this._magnitude) return this._magnitude + + var sumOfSquares = 0, + elementsLength = this.elements.length + + for (var i = 1; i < elementsLength; i += 2) { + var val = this.elements[i] + sumOfSquares += val * val + } + + return this._magnitude = Math.sqrt(sumOfSquares) +} + +/** + * Calculates the dot product of this vector and another vector. + * + * @param {lunr.Vector} otherVector - The vector to compute the dot product with. + * @returns {Number} + */ +lunr.Vector.prototype.dot = function (otherVector) { + var dotProduct = 0, + a = this.elements, b = otherVector.elements, + aLen = a.length, bLen = b.length, + aVal = 0, bVal = 0, + i = 0, j = 0 + + while (i < aLen && j < bLen) { + aVal = a[i], bVal = b[j] + if (aVal < bVal) { + i += 2 + } else if (aVal > bVal) { + j += 2 + } else if (aVal == bVal) { + dotProduct += a[i + 1] * b[j + 1] + i += 2 + j += 2 + } + } + + return dotProduct +} + +/** + * Calculates the similarity between this vector and another vector. + * + * @param {lunr.Vector} otherVector - The other vector to calculate the + * similarity with. + * @returns {Number} + */ +lunr.Vector.prototype.similarity = function (otherVector) { + return this.dot(otherVector) / this.magnitude() || 0 +} + +/** + * Converts the vector to an array of the elements within the vector. + * + * @returns {Number[]} + */ +lunr.Vector.prototype.toArray = function () { + var output = new Array (this.elements.length / 2) + + for (var i = 1, j = 0; i < this.elements.length; i += 2, j++) { + output[j] = this.elements[i] + } + + return output +} + +/** + * A JSON serializable representation of the vector. + * + * @returns {Number[]} + */ +lunr.Vector.prototype.toJSON = function () { + return this.elements +} +/* eslint-disable */ +/*! + * lunr.stemmer + * Copyright (C) 2020 Oliver Nightingale + * Includes code from - http://tartarus.org/~martin/PorterStemmer/js.txt + */ + +/** + * lunr.stemmer is an english language stemmer, this is a JavaScript + * implementation of the PorterStemmer taken from http://tartarus.org/~martin + * + * @static + * @implements {lunr.PipelineFunction} + * @param {lunr.Token} token - The string to stem + * @returns {lunr.Token} + * @see {@link lunr.Pipeline} + * @function + */ +lunr.stemmer = (function(){ + var step2list = { + "ational" : "ate", + "tional" : "tion", + "enci" : "ence", + "anci" : "ance", + "izer" : "ize", + "bli" : "ble", + "alli" : "al", + "entli" : "ent", + "eli" : "e", + "ousli" : "ous", + "ization" : "ize", + "ation" : "ate", + "ator" : "ate", + "alism" : "al", + "iveness" : "ive", + "fulness" : "ful", + "ousness" : "ous", + "aliti" : "al", + "iviti" : "ive", + "biliti" : "ble", + "logi" : "log" + }, + + step3list = { + "icate" : "ic", + "ative" : "", + "alize" : "al", + "iciti" : "ic", + "ical" : "ic", + "ful" : "", + "ness" : "" + }, + + c = "[^aeiou]", // consonant + v = "[aeiouy]", // vowel + C = c + "[^aeiouy]*", // consonant sequence + V = v + "[aeiou]*", // vowel sequence + + mgr0 = "^(" + C + ")?" + V + C, // [C]VC... is m>0 + meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$", // [C]VC[V] is m=1 + mgr1 = "^(" + C + ")?" + V + C + V + C, // [C]VCVC... is m>1 + s_v = "^(" + C + ")?" + v; // vowel in stem + + var re_mgr0 = new RegExp(mgr0); + var re_mgr1 = new RegExp(mgr1); + var re_meq1 = new RegExp(meq1); + var re_s_v = new RegExp(s_v); + + var re_1a = /^(.+?)(ss|i)es$/; + var re2_1a = /^(.+?)([^s])s$/; + var re_1b = /^(.+?)eed$/; + var re2_1b = /^(.+?)(ed|ing)$/; + var re_1b_2 = /.$/; + var re2_1b_2 = /(at|bl|iz)$/; + var re3_1b_2 = new RegExp("([^aeiouylsz])\\1$"); + var re4_1b_2 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + + var re_1c = /^(.+?[^aeiou])y$/; + var re_2 = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/; + + var re_3 = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/; + + var re_4 = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/; + var re2_4 = /^(.+?)(s|t)(ion)$/; + + var re_5 = /^(.+?)e$/; + var re_5_1 = /ll$/; + var re3_5 = new RegExp("^" + C + v + "[^aeiouwxy]$"); + + var porterStemmer = function porterStemmer(w) { + var stem, + suffix, + firstch, + re, + re2, + re3, + re4; + + if (w.length < 3) { return w; } + + firstch = w.substr(0,1); + if (firstch == "y") { + w = firstch.toUpperCase() + w.substr(1); + } + + // Step 1a + re = re_1a + re2 = re2_1a; + + if (re.test(w)) { w = w.replace(re,"$1$2"); } + else if (re2.test(w)) { w = w.replace(re2,"$1$2"); } + + // Step 1b + re = re_1b; + re2 = re2_1b; + if (re.test(w)) { + var fp = re.exec(w); + re = re_mgr0; + if (re.test(fp[1])) { + re = re_1b_2; + w = w.replace(re,""); + } + } else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1]; + re2 = re_s_v; + if (re2.test(stem)) { + w = stem; + re2 = re2_1b_2; + re3 = re3_1b_2; + re4 = re4_1b_2; + if (re2.test(w)) { w = w + "e"; } + else if (re3.test(w)) { re = re_1b_2; w = w.replace(re,""); } + else if (re4.test(w)) { w = w + "e"; } + } + } + + // Step 1c - replace suffix y or Y by i if preceded by a non-vowel which is not the first letter of the word (so cry -> cri, by -> by, say -> say) + re = re_1c; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + w = stem + "i"; + } + + // Step 2 + re = re_2; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = re_mgr0; + if (re.test(stem)) { + w = stem + step2list[suffix]; + } + } + + // Step 3 + re = re_3; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + suffix = fp[2]; + re = re_mgr0; + if (re.test(stem)) { + w = stem + step3list[suffix]; + } + } + + // Step 4 + re = re_4; + re2 = re2_4; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = re_mgr1; + if (re.test(stem)) { + w = stem; + } + } else if (re2.test(w)) { + var fp = re2.exec(w); + stem = fp[1] + fp[2]; + re2 = re_mgr1; + if (re2.test(stem)) { + w = stem; + } + } + + // Step 5 + re = re_5; + if (re.test(w)) { + var fp = re.exec(w); + stem = fp[1]; + re = re_mgr1; + re2 = re_meq1; + re3 = re3_5; + if (re.test(stem) || (re2.test(stem) && !(re3.test(stem)))) { + w = stem; + } + } + + re = re_5_1; + re2 = re_mgr1; + if (re.test(w) && re2.test(w)) { + re = re_1b_2; + w = w.replace(re,""); + } + + // and turn initial Y back to y + + if (firstch == "y") { + w = firstch.toLowerCase() + w.substr(1); + } + + return w; + }; + + return function (token) { + return token.update(porterStemmer); + } +})(); + +lunr.Pipeline.registerFunction(lunr.stemmer, 'stemmer') +/*! + * lunr.stopWordFilter + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * lunr.generateStopWordFilter builds a stopWordFilter function from the provided + * list of stop words. + * + * The built in lunr.stopWordFilter is built using this generator and can be used + * to generate custom stopWordFilters for applications or non English languages. + * + * @function + * @param {Array} token The token to pass through the filter + * @returns {lunr.PipelineFunction} + * @see lunr.Pipeline + * @see lunr.stopWordFilter + */ +lunr.generateStopWordFilter = function (stopWords) { + var words = stopWords.reduce(function (memo, stopWord) { + memo[stopWord] = stopWord + return memo + }, {}) + + return function (token) { + if (token && words[token.toString()] !== token.toString()) return token + } +} + +/** + * lunr.stopWordFilter is an English language stop word list filter, any words + * contained in the list will not be passed through the filter. + * + * This is intended to be used in the Pipeline. If the token does not pass the + * filter then undefined will be returned. + * + * @function + * @implements {lunr.PipelineFunction} + * @params {lunr.Token} token - A token to check for being a stop word. + * @returns {lunr.Token} + * @see {@link lunr.Pipeline} + */ +lunr.stopWordFilter = lunr.generateStopWordFilter([ + 'a', + 'able', + 'about', + 'across', + 'after', + 'all', + 'almost', + 'also', + 'am', + 'among', + 'an', + 'and', + 'any', + 'are', + 'as', + 'at', + 'be', + 'because', + 'been', + 'but', + 'by', + 'can', + 'cannot', + 'could', + 'dear', + 'did', + 'do', + 'does', + 'either', + 'else', + 'ever', + 'every', + 'for', + 'from', + 'get', + 'got', + 'had', + 'has', + 'have', + 'he', + 'her', + 'hers', + 'him', + 'his', + 'how', + 'however', + 'i', + 'if', + 'in', + 'into', + 'is', + 'it', + 'its', + 'just', + 'least', + 'let', + 'like', + 'likely', + 'may', + 'me', + 'might', + 'most', + 'must', + 'my', + 'neither', + 'no', + 'nor', + 'not', + 'of', + 'off', + 'often', + 'on', + 'only', + 'or', + 'other', + 'our', + 'own', + 'rather', + 'said', + 'say', + 'says', + 'she', + 'should', + 'since', + 'so', + 'some', + 'than', + 'that', + 'the', + 'their', + 'them', + 'then', + 'there', + 'these', + 'they', + 'this', + 'tis', + 'to', + 'too', + 'twas', + 'us', + 'wants', + 'was', + 'we', + 'were', + 'what', + 'when', + 'where', + 'which', + 'while', + 'who', + 'whom', + 'why', + 'will', + 'with', + 'would', + 'yet', + 'you', + 'your' +]) + +lunr.Pipeline.registerFunction(lunr.stopWordFilter, 'stopWordFilter') +/*! + * lunr.trimmer + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * lunr.trimmer is a pipeline function for trimming non word + * characters from the beginning and end of tokens before they + * enter the index. + * + * This implementation may not work correctly for non latin + * characters and should either be removed or adapted for use + * with languages with non-latin characters. + * + * @static + * @implements {lunr.PipelineFunction} + * @param {lunr.Token} token The token to pass through the filter + * @returns {lunr.Token} + * @see lunr.Pipeline + */ +lunr.trimmer = function (token) { + return token.update(function (s) { + return s.replace(/^\W+/, '').replace(/\W+$/, '') + }) +} + +lunr.Pipeline.registerFunction(lunr.trimmer, 'trimmer') +/*! + * lunr.TokenSet + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * A token set is used to store the unique list of all tokens + * within an index. Token sets are also used to represent an + * incoming query to the index, this query token set and index + * token set are then intersected to find which tokens to look + * up in the inverted index. + * + * A token set can hold multiple tokens, as in the case of the + * index token set, or it can hold a single token as in the + * case of a simple query token set. + * + * Additionally token sets are used to perform wildcard matching. + * Leading, contained and trailing wildcards are supported, and + * from this edit distance matching can also be provided. + * + * Token sets are implemented as a minimal finite state automata, + * where both common prefixes and suffixes are shared between tokens. + * This helps to reduce the space used for storing the token set. + * + * @constructor + */ +lunr.TokenSet = function () { + this.final = false + this.edges = {} + this.id = lunr.TokenSet._nextId + lunr.TokenSet._nextId += 1 +} + +/** + * Keeps track of the next, auto increment, identifier to assign + * to a new tokenSet. + * + * TokenSets require a unique identifier to be correctly minimised. + * + * @private + */ +lunr.TokenSet._nextId = 1 + +/** + * Creates a TokenSet instance from the given sorted array of words. + * + * @param {String[]} arr - A sorted array of strings to create the set from. + * @returns {lunr.TokenSet} + * @throws Will throw an error if the input array is not sorted. + */ +lunr.TokenSet.fromArray = function (arr) { + var builder = new lunr.TokenSet.Builder + + for (var i = 0, len = arr.length; i < len; i++) { + builder.insert(arr[i]) + } + + builder.finish() + return builder.root +} + +/** + * Creates a token set from a query clause. + * + * @private + * @param {Object} clause - A single clause from lunr.Query. + * @param {string} clause.term - The query clause term. + * @param {number} [clause.editDistance] - The optional edit distance for the term. + * @returns {lunr.TokenSet} + */ +lunr.TokenSet.fromClause = function (clause) { + if ('editDistance' in clause) { + return lunr.TokenSet.fromFuzzyString(clause.term, clause.editDistance) + } else { + return lunr.TokenSet.fromString(clause.term) + } +} + +/** + * Creates a token set representing a single string with a specified + * edit distance. + * + * Insertions, deletions, substitutions and transpositions are each + * treated as an edit distance of 1. + * + * Increasing the allowed edit distance will have a dramatic impact + * on the performance of both creating and intersecting these TokenSets. + * It is advised to keep the edit distance less than 3. + * + * @param {string} str - The string to create the token set from. + * @param {number} editDistance - The allowed edit distance to match. + * @returns {lunr.Vector} + */ +lunr.TokenSet.fromFuzzyString = function (str, editDistance) { + var root = new lunr.TokenSet + + var stack = [{ + node: root, + editsRemaining: editDistance, + str: str + }] + + while (stack.length) { + var frame = stack.pop() + + // no edit + if (frame.str.length > 0) { + var char = frame.str.charAt(0), + noEditNode + + if (char in frame.node.edges) { + noEditNode = frame.node.edges[char] + } else { + noEditNode = new lunr.TokenSet + frame.node.edges[char] = noEditNode + } + + if (frame.str.length == 1) { + noEditNode.final = true + } + + stack.push({ + node: noEditNode, + editsRemaining: frame.editsRemaining, + str: frame.str.slice(1) + }) + } + + if (frame.editsRemaining == 0) { + continue + } + + // insertion + if ("*" in frame.node.edges) { + var insertionNode = frame.node.edges["*"] + } else { + var insertionNode = new lunr.TokenSet + frame.node.edges["*"] = insertionNode + } + + if (frame.str.length == 0) { + insertionNode.final = true + } + + stack.push({ + node: insertionNode, + editsRemaining: frame.editsRemaining - 1, + str: frame.str + }) + + // deletion + // can only do a deletion if we have enough edits remaining + // and if there are characters left to delete in the string + if (frame.str.length > 1) { + stack.push({ + node: frame.node, + editsRemaining: frame.editsRemaining - 1, + str: frame.str.slice(1) + }) + } + + // deletion + // just removing the last character from the str + if (frame.str.length == 1) { + frame.node.final = true + } + + // substitution + // can only do a substitution if we have enough edits remaining + // and if there are characters left to substitute + if (frame.str.length >= 1) { + if ("*" in frame.node.edges) { + var substitutionNode = frame.node.edges["*"] + } else { + var substitutionNode = new lunr.TokenSet + frame.node.edges["*"] = substitutionNode + } + + if (frame.str.length == 1) { + substitutionNode.final = true + } + + stack.push({ + node: substitutionNode, + editsRemaining: frame.editsRemaining - 1, + str: frame.str.slice(1) + }) + } + + // transposition + // can only do a transposition if there are edits remaining + // and there are enough characters to transpose + if (frame.str.length > 1) { + var charA = frame.str.charAt(0), + charB = frame.str.charAt(1), + transposeNode + + if (charB in frame.node.edges) { + transposeNode = frame.node.edges[charB] + } else { + transposeNode = new lunr.TokenSet + frame.node.edges[charB] = transposeNode + } + + if (frame.str.length == 1) { + transposeNode.final = true + } + + stack.push({ + node: transposeNode, + editsRemaining: frame.editsRemaining - 1, + str: charA + frame.str.slice(2) + }) + } + } + + return root +} + +/** + * Creates a TokenSet from a string. + * + * The string may contain one or more wildcard characters (*) + * that will allow wildcard matching when intersecting with + * another TokenSet. + * + * @param {string} str - The string to create a TokenSet from. + * @returns {lunr.TokenSet} + */ +lunr.TokenSet.fromString = function (str) { + var node = new lunr.TokenSet, + root = node + + /* + * Iterates through all characters within the passed string + * appending a node for each character. + * + * When a wildcard character is found then a self + * referencing edge is introduced to continually match + * any number of any characters. + */ + for (var i = 0, len = str.length; i < len; i++) { + var char = str[i], + final = (i == len - 1) + + if (char == "*") { + node.edges[char] = node + node.final = final + + } else { + var next = new lunr.TokenSet + next.final = final + + node.edges[char] = next + node = next + } + } + + return root +} + +/** + * Converts this TokenSet into an array of strings + * contained within the TokenSet. + * + * This is not intended to be used on a TokenSet that + * contains wildcards, in these cases the results are + * undefined and are likely to cause an infinite loop. + * + * @returns {string[]} + */ +lunr.TokenSet.prototype.toArray = function () { + var words = [] + + var stack = [{ + prefix: "", + node: this + }] + + while (stack.length) { + var frame = stack.pop(), + edges = Object.keys(frame.node.edges), + len = edges.length + + if (frame.node.final) { + /* In Safari, at this point the prefix is sometimes corrupted, see: + * https://github.com/olivernn/lunr.js/issues/279 Calling any + * String.prototype method forces Safari to "cast" this string to what + * it's supposed to be, fixing the bug. */ + frame.prefix.charAt(0) + words.push(frame.prefix) + } + + for (var i = 0; i < len; i++) { + var edge = edges[i] + + stack.push({ + prefix: frame.prefix.concat(edge), + node: frame.node.edges[edge] + }) + } + } + + return words +} + +/** + * Generates a string representation of a TokenSet. + * + * This is intended to allow TokenSets to be used as keys + * in objects, largely to aid the construction and minimisation + * of a TokenSet. As such it is not designed to be a human + * friendly representation of the TokenSet. + * + * @returns {string} + */ +lunr.TokenSet.prototype.toString = function () { + // NOTE: Using Object.keys here as this.edges is very likely + // to enter 'hash-mode' with many keys being added + // + // avoiding a for-in loop here as it leads to the function + // being de-optimised (at least in V8). From some simple + // benchmarks the performance is comparable, but allowing + // V8 to optimize may mean easy performance wins in the future. + + if (this._str) { + return this._str + } + + var str = this.final ? '1' : '0', + labels = Object.keys(this.edges).sort(), + len = labels.length + + for (var i = 0; i < len; i++) { + var label = labels[i], + node = this.edges[label] + + str = str + label + node.id + } + + return str +} + +/** + * Returns a new TokenSet that is the intersection of + * this TokenSet and the passed TokenSet. + * + * This intersection will take into account any wildcards + * contained within the TokenSet. + * + * @param {lunr.TokenSet} b - An other TokenSet to intersect with. + * @returns {lunr.TokenSet} + */ +lunr.TokenSet.prototype.intersect = function (b) { + var output = new lunr.TokenSet, + frame = undefined + + var stack = [{ + qNode: b, + output: output, + node: this + }] + + while (stack.length) { + frame = stack.pop() + + // NOTE: As with the #toString method, we are using + // Object.keys and a for loop instead of a for-in loop + // as both of these objects enter 'hash' mode, causing + // the function to be de-optimised in V8 + var qEdges = Object.keys(frame.qNode.edges), + qLen = qEdges.length, + nEdges = Object.keys(frame.node.edges), + nLen = nEdges.length + + for (var q = 0; q < qLen; q++) { + var qEdge = qEdges[q] + + for (var n = 0; n < nLen; n++) { + var nEdge = nEdges[n] + + if (nEdge == qEdge || qEdge == '*') { + var node = frame.node.edges[nEdge], + qNode = frame.qNode.edges[qEdge], + final = node.final && qNode.final, + next = undefined + + if (nEdge in frame.output.edges) { + // an edge already exists for this character + // no need to create a new node, just set the finality + // bit unless this node is already final + next = frame.output.edges[nEdge] + next.final = next.final || final + + } else { + // no edge exists yet, must create one + // set the finality bit and insert it + // into the output + next = new lunr.TokenSet + next.final = final + frame.output.edges[nEdge] = next + } + + stack.push({ + qNode: qNode, + output: next, + node: node + }) + } + } + } + } + + return output +} +lunr.TokenSet.Builder = function () { + this.previousWord = "" + this.root = new lunr.TokenSet + this.uncheckedNodes = [] + this.minimizedNodes = {} +} + +lunr.TokenSet.Builder.prototype.insert = function (word) { + var node, + commonPrefix = 0 + + if (word < this.previousWord) { + throw new Error ("Out of order word insertion") + } + + for (var i = 0; i < word.length && i < this.previousWord.length; i++) { + if (word[i] != this.previousWord[i]) break + commonPrefix++ + } + + this.minimize(commonPrefix) + + if (this.uncheckedNodes.length == 0) { + node = this.root + } else { + node = this.uncheckedNodes[this.uncheckedNodes.length - 1].child + } + + for (var i = commonPrefix; i < word.length; i++) { + var nextNode = new lunr.TokenSet, + char = word[i] + + node.edges[char] = nextNode + + this.uncheckedNodes.push({ + parent: node, + char: char, + child: nextNode + }) + + node = nextNode + } + + node.final = true + this.previousWord = word +} + +lunr.TokenSet.Builder.prototype.finish = function () { + this.minimize(0) +} + +lunr.TokenSet.Builder.prototype.minimize = function (downTo) { + for (var i = this.uncheckedNodes.length - 1; i >= downTo; i--) { + var node = this.uncheckedNodes[i], + childKey = node.child.toString() + + if (childKey in this.minimizedNodes) { + node.parent.edges[node.char] = this.minimizedNodes[childKey] + } else { + // Cache the key for this node since + // we know it can't change anymore + node.child._str = childKey + + this.minimizedNodes[childKey] = node.child + } + + this.uncheckedNodes.pop() + } +} +/*! + * lunr.Index + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * An index contains the built index of all documents and provides a query interface + * to the index. + * + * Usually instances of lunr.Index will not be created using this constructor, instead + * lunr.Builder should be used to construct new indexes, or lunr.Index.load should be + * used to load previously built and serialized indexes. + * + * @constructor + * @param {Object} attrs - The attributes of the built search index. + * @param {Object} attrs.invertedIndex - An index of term/field to document reference. + * @param {Object} attrs.fieldVectors - Field vectors + * @param {lunr.TokenSet} attrs.tokenSet - An set of all corpus tokens. + * @param {string[]} attrs.fields - The names of indexed document fields. + * @param {lunr.Pipeline} attrs.pipeline - The pipeline to use for search terms. + */ +lunr.Index = function (attrs) { + this.invertedIndex = attrs.invertedIndex + this.fieldVectors = attrs.fieldVectors + this.tokenSet = attrs.tokenSet + this.fields = attrs.fields + this.pipeline = attrs.pipeline +} + +/** + * A result contains details of a document matching a search query. + * @typedef {Object} lunr.Index~Result + * @property {string} ref - The reference of the document this result represents. + * @property {number} score - A number between 0 and 1 representing how similar this document is to the query. + * @property {lunr.MatchData} matchData - Contains metadata about this match including which term(s) caused the match. + */ + +/** + * Although lunr provides the ability to create queries using lunr.Query, it also provides a simple + * query language which itself is parsed into an instance of lunr.Query. + * + * For programmatically building queries it is advised to directly use lunr.Query, the query language + * is best used for human entered text rather than program generated text. + * + * At its simplest queries can just be a single term, e.g. `hello`, multiple terms are also supported + * and will be combined with OR, e.g `hello world` will match documents that contain either 'hello' + * or 'world', though those that contain both will rank higher in the results. + * + * Wildcards can be included in terms to match one or more unspecified characters, these wildcards can + * be inserted anywhere within the term, and more than one wildcard can exist in a single term. Adding + * wildcards will increase the number of documents that will be found but can also have a negative + * impact on query performance, especially with wildcards at the beginning of a term. + * + * Terms can be restricted to specific fields, e.g. `title:hello`, only documents with the term + * hello in the title field will match this query. Using a field not present in the index will lead + * to an error being thrown. + * + * Modifiers can also be added to terms, lunr supports edit distance and boost modifiers on terms. A term + * boost will make documents matching that term score higher, e.g. `foo^5`. Edit distance is also supported + * to provide fuzzy matching, e.g. 'hello~2' will match documents with hello with an edit distance of 2. + * Avoid large values for edit distance to improve query performance. + * + * Each term also supports a presence modifier. By default a term's presence in document is optional, however + * this can be changed to either required or prohibited. For a term's presence to be required in a document the + * term should be prefixed with a '+', e.g. `+foo bar` is a search for documents that must contain 'foo' and + * optionally contain 'bar'. Conversely a leading '-' sets the terms presence to prohibited, i.e. it must not + * appear in a document, e.g. `-foo bar` is a search for documents that do not contain 'foo' but may contain 'bar'. + * + * To escape special characters the backslash character '\' can be used, this allows searches to include + * characters that would normally be considered modifiers, e.g. `foo\~2` will search for a term "foo~2" instead + * of attempting to apply a boost of 2 to the search term "foo". + * + * @typedef {string} lunr.Index~QueryString + * @example Simple single term query + * hello + * @example Multiple term query + * hello world + * @example term scoped to a field + * title:hello + * @example term with a boost of 10 + * hello^10 + * @example term with an edit distance of 2 + * hello~2 + * @example terms with presence modifiers + * -foo +bar baz + */ + +/** + * Performs a search against the index using lunr query syntax. + * + * Results will be returned sorted by their score, the most relevant results + * will be returned first. For details on how the score is calculated, please see + * the {@link https://lunrjs.com/guides/searching.html#scoring|guide}. + * + * For more programmatic querying use lunr.Index#query. + * + * @param {lunr.Index~QueryString} queryString - A string containing a lunr query. + * @throws {lunr.QueryParseError} If the passed query string cannot be parsed. + * @returns {lunr.Index~Result[]} + */ +lunr.Index.prototype.search = function (queryString) { + return this.query(function (query) { + var parser = new lunr.QueryParser(queryString, query) + parser.parse() + }) +} + +/** + * A query builder callback provides a query object to be used to express + * the query to perform on the index. + * + * @callback lunr.Index~queryBuilder + * @param {lunr.Query} query - The query object to build up. + * @this lunr.Query + */ + +/** + * Performs a query against the index using the yielded lunr.Query object. + * + * If performing programmatic queries against the index, this method is preferred + * over lunr.Index#search so as to avoid the additional query parsing overhead. + * + * A query object is yielded to the supplied function which should be used to + * express the query to be run against the index. + * + * Note that although this function takes a callback parameter it is _not_ an + * asynchronous operation, the callback is just yielded a query object to be + * customized. + * + * @param {lunr.Index~queryBuilder} fn - A function that is used to build the query. + * @returns {lunr.Index~Result[]} + */ +lunr.Index.prototype.query = function (fn) { + // for each query clause + // * process terms + // * expand terms from token set + // * find matching documents and metadata + // * get document vectors + // * score documents + + var query = new lunr.Query(this.fields), + matchingFields = Object.create(null), + queryVectors = Object.create(null), + termFieldCache = Object.create(null), + requiredMatches = Object.create(null), + prohibitedMatches = Object.create(null) + + /* + * To support field level boosts a query vector is created per + * field. An empty vector is eagerly created to support negated + * queries. + */ + for (var i = 0; i < this.fields.length; i++) { + queryVectors[this.fields[i]] = new lunr.Vector + } + + fn.call(query, query) + + for (var i = 0; i < query.clauses.length; i++) { + /* + * Unless the pipeline has been disabled for this term, which is + * the case for terms with wildcards, we need to pass the clause + * term through the search pipeline. A pipeline returns an array + * of processed terms. Pipeline functions may expand the passed + * term, which means we may end up performing multiple index lookups + * for a single query term. + */ + var clause = query.clauses[i], + terms = null, + clauseMatches = lunr.Set.empty + + if (clause.usePipeline) { + terms = this.pipeline.runString(clause.term, { + fields: clause.fields + }) + } else { + terms = [clause.term] + } + + for (var m = 0; m < terms.length; m++) { + var term = terms[m] + + /* + * Each term returned from the pipeline needs to use the same query + * clause object, e.g. the same boost and or edit distance. The + * simplest way to do this is to re-use the clause object but mutate + * its term property. + */ + clause.term = term + + /* + * From the term in the clause we create a token set which will then + * be used to intersect the indexes token set to get a list of terms + * to lookup in the inverted index + */ + var termTokenSet = lunr.TokenSet.fromClause(clause), + expandedTerms = this.tokenSet.intersect(termTokenSet).toArray() + + /* + * If a term marked as required does not exist in the tokenSet it is + * impossible for the search to return any matches. We set all the field + * scoped required matches set to empty and stop examining any further + * clauses. + */ + if (expandedTerms.length === 0 && clause.presence === lunr.Query.presence.REQUIRED) { + for (var k = 0; k < clause.fields.length; k++) { + var field = clause.fields[k] + requiredMatches[field] = lunr.Set.empty + } + + break + } + + for (var j = 0; j < expandedTerms.length; j++) { + /* + * For each term get the posting and termIndex, this is required for + * building the query vector. + */ + var expandedTerm = expandedTerms[j], + posting = this.invertedIndex[expandedTerm], + termIndex = posting._index + + for (var k = 0; k < clause.fields.length; k++) { + /* + * For each field that this query term is scoped by (by default + * all fields are in scope) we need to get all the document refs + * that have this term in that field. + * + * The posting is the entry in the invertedIndex for the matching + * term from above. + */ + var field = clause.fields[k], + fieldPosting = posting[field], + matchingDocumentRefs = Object.keys(fieldPosting), + termField = expandedTerm + "/" + field, + matchingDocumentsSet = new lunr.Set(matchingDocumentRefs) + + /* + * if the presence of this term is required ensure that the matching + * documents are added to the set of required matches for this clause. + * + */ + if (clause.presence == lunr.Query.presence.REQUIRED) { + clauseMatches = clauseMatches.union(matchingDocumentsSet) + + if (requiredMatches[field] === undefined) { + requiredMatches[field] = lunr.Set.complete + } + } + + /* + * if the presence of this term is prohibited ensure that the matching + * documents are added to the set of prohibited matches for this field, + * creating that set if it does not yet exist. + */ + if (clause.presence == lunr.Query.presence.PROHIBITED) { + if (prohibitedMatches[field] === undefined) { + prohibitedMatches[field] = lunr.Set.empty + } + + prohibitedMatches[field] = prohibitedMatches[field].union(matchingDocumentsSet) + + /* + * Prohibited matches should not be part of the query vector used for + * similarity scoring and no metadata should be extracted so we continue + * to the next field + */ + continue + } + + /* + * The query field vector is populated using the termIndex found for + * the term and a unit value with the appropriate boost applied. + * Using upsert because there could already be an entry in the vector + * for the term we are working with. In that case we just add the scores + * together. + */ + queryVectors[field].upsert(termIndex, clause.boost, function (a, b) { return a + b }) + + /** + * If we've already seen this term, field combo then we've already collected + * the matching documents and metadata, no need to go through all that again + */ + if (termFieldCache[termField]) { + continue + } + + for (var l = 0; l < matchingDocumentRefs.length; l++) { + /* + * All metadata for this term/field/document triple + * are then extracted and collected into an instance + * of lunr.MatchData ready to be returned in the query + * results + */ + var matchingDocumentRef = matchingDocumentRefs[l], + matchingFieldRef = new lunr.FieldRef (matchingDocumentRef, field), + metadata = fieldPosting[matchingDocumentRef], + fieldMatch + + if ((fieldMatch = matchingFields[matchingFieldRef]) === undefined) { + matchingFields[matchingFieldRef] = new lunr.MatchData (expandedTerm, field, metadata) + } else { + fieldMatch.add(expandedTerm, field, metadata) + } + + } + + termFieldCache[termField] = true + } + } + } + + /** + * If the presence was required we need to update the requiredMatches field sets. + * We do this after all fields for the term have collected their matches because + * the clause terms presence is required in _any_ of the fields not _all_ of the + * fields. + */ + if (clause.presence === lunr.Query.presence.REQUIRED) { + for (var k = 0; k < clause.fields.length; k++) { + var field = clause.fields[k] + requiredMatches[field] = requiredMatches[field].intersect(clauseMatches) + } + } + } + + /** + * Need to combine the field scoped required and prohibited + * matching documents into a global set of required and prohibited + * matches + */ + var allRequiredMatches = lunr.Set.complete, + allProhibitedMatches = lunr.Set.empty + + for (var i = 0; i < this.fields.length; i++) { + var field = this.fields[i] + + if (requiredMatches[field]) { + allRequiredMatches = allRequiredMatches.intersect(requiredMatches[field]) + } + + if (prohibitedMatches[field]) { + allProhibitedMatches = allProhibitedMatches.union(prohibitedMatches[field]) + } + } + + var matchingFieldRefs = Object.keys(matchingFields), + results = [], + matches = Object.create(null) + + /* + * If the query is negated (contains only prohibited terms) + * we need to get _all_ fieldRefs currently existing in the + * index. This is only done when we know that the query is + * entirely prohibited terms to avoid any cost of getting all + * fieldRefs unnecessarily. + * + * Additionally, blank MatchData must be created to correctly + * populate the results. + */ + if (query.isNegated()) { + matchingFieldRefs = Object.keys(this.fieldVectors) + + for (var i = 0; i < matchingFieldRefs.length; i++) { + var matchingFieldRef = matchingFieldRefs[i] + var fieldRef = lunr.FieldRef.fromString(matchingFieldRef) + matchingFields[matchingFieldRef] = new lunr.MatchData + } + } + + for (var i = 0; i < matchingFieldRefs.length; i++) { + /* + * Currently we have document fields that match the query, but we + * need to return documents. The matchData and scores are combined + * from multiple fields belonging to the same document. + * + * Scores are calculated by field, using the query vectors created + * above, and combined into a final document score using addition. + */ + var fieldRef = lunr.FieldRef.fromString(matchingFieldRefs[i]), + docRef = fieldRef.docRef + + if (!allRequiredMatches.contains(docRef)) { + continue + } + + if (allProhibitedMatches.contains(docRef)) { + continue + } + + var fieldVector = this.fieldVectors[fieldRef], + score = queryVectors[fieldRef.fieldName].similarity(fieldVector), + docMatch + + if ((docMatch = matches[docRef]) !== undefined) { + docMatch.score += score + docMatch.matchData.combine(matchingFields[fieldRef]) + } else { + var match = { + ref: docRef, + score: score, + matchData: matchingFields[fieldRef] + } + matches[docRef] = match + results.push(match) + } + } + + /* + * Sort the results objects by score, highest first. + */ + return results.sort(function (a, b) { + return b.score - a.score + }) +} + +/** + * Prepares the index for JSON serialization. + * + * The schema for this JSON blob will be described in a + * separate JSON schema file. + * + * @returns {Object} + */ +lunr.Index.prototype.toJSON = function () { + var invertedIndex = Object.keys(this.invertedIndex) + .sort() + .map(function (term) { + return [term, this.invertedIndex[term]] + }, this) + + var fieldVectors = Object.keys(this.fieldVectors) + .map(function (ref) { + return [ref, this.fieldVectors[ref].toJSON()] + }, this) + + return { + version: lunr.version, + fields: this.fields, + fieldVectors: fieldVectors, + invertedIndex: invertedIndex, + pipeline: this.pipeline.toJSON() + } +} + +/** + * Loads a previously serialized lunr.Index + * + * @param {Object} serializedIndex - A previously serialized lunr.Index + * @returns {lunr.Index} + */ +lunr.Index.load = function (serializedIndex) { + var attrs = {}, + fieldVectors = {}, + serializedVectors = serializedIndex.fieldVectors, + invertedIndex = Object.create(null), + serializedInvertedIndex = serializedIndex.invertedIndex, + tokenSetBuilder = new lunr.TokenSet.Builder, + pipeline = lunr.Pipeline.load(serializedIndex.pipeline) + + if (serializedIndex.version != lunr.version) { + lunr.utils.warn("Version mismatch when loading serialised index. Current version of lunr '" + lunr.version + "' does not match serialized index '" + serializedIndex.version + "'") + } + + for (var i = 0; i < serializedVectors.length; i++) { + var tuple = serializedVectors[i], + ref = tuple[0], + elements = tuple[1] + + fieldVectors[ref] = new lunr.Vector(elements) + } + + for (var i = 0; i < serializedInvertedIndex.length; i++) { + var tuple = serializedInvertedIndex[i], + term = tuple[0], + posting = tuple[1] + + tokenSetBuilder.insert(term) + invertedIndex[term] = posting + } + + tokenSetBuilder.finish() + + attrs.fields = serializedIndex.fields + + attrs.fieldVectors = fieldVectors + attrs.invertedIndex = invertedIndex + attrs.tokenSet = tokenSetBuilder.root + attrs.pipeline = pipeline + + return new lunr.Index(attrs) +} +/*! + * lunr.Builder + * Copyright (C) 2020 Oliver Nightingale + */ + +/** + * lunr.Builder performs indexing on a set of documents and + * returns instances of lunr.Index ready for querying. + * + * All configuration of the index is done via the builder, the + * fields to index, the document reference, the text processing + * pipeline and document scoring parameters are all set on the + * builder before indexing. + * + * @constructor + * @property {string} _ref - Internal reference to the document reference field. + * @property {string[]} _fields - Internal reference to the document fields to index. + * @property {object} invertedIndex - The inverted index maps terms to document fields. + * @property {object} documentTermFrequencies - Keeps track of document term frequencies. + * @property {object} documentLengths - Keeps track of the length of documents added to the index. + * @property {lunr.tokenizer} tokenizer - Function for splitting strings into tokens for indexing. + * @property {lunr.Pipeline} pipeline - The pipeline performs text processing on tokens before indexing. + * @property {lunr.Pipeline} searchPipeline - A pipeline for processing search terms before querying the index. + * @property {number} documentCount - Keeps track of the total number of documents indexed. + * @property {number} _b - A parameter to control field length normalization, setting this to 0 disabled normalization, 1 fully normalizes field lengths, the default value is 0.75. + * @property {number} _k1 - A parameter to control how quickly an increase in term frequency results in term frequency saturation, the default value is 1.2. + * @property {number} termIndex - A counter incremented for each unique term, used to identify a terms position in the vector space. + * @property {array} metadataWhitelist - A list of metadata keys that have been whitelisted for entry in the index. + */ +lunr.Builder = function () { + this._ref = "id" + this._fields = Object.create(null) + this._documents = Object.create(null) + this.invertedIndex = Object.create(null) + this.fieldTermFrequencies = {} + this.fieldLengths = {} + this.tokenizer = lunr.tokenizer + this.pipeline = new lunr.Pipeline + this.searchPipeline = new lunr.Pipeline + this.documentCount = 0 + this._b = 0.75 + this._k1 = 1.2 + this.termIndex = 0 + this.metadataWhitelist = [] +} + +/** + * Sets the document field used as the document reference. Every document must have this field. + * The type of this field in the document should be a string, if it is not a string it will be + * coerced into a string by calling toString. + * + * The default ref is 'id'. + * + * The ref should _not_ be changed during indexing, it should be set before any documents are + * added to the index. Changing it during indexing can lead to inconsistent results. + * + * @param {string} ref - The name of the reference field in the document. + */ +lunr.Builder.prototype.ref = function (ref) { + this._ref = ref +} + +/** + * A function that is used to extract a field from a document. + * + * Lunr expects a field to be at the top level of a document, if however the field + * is deeply nested within a document an extractor function can be used to extract + * the right field for indexing. + * + * @callback fieldExtractor + * @param {object} doc - The document being added to the index. + * @returns {?(string|object|object[])} obj - The object that will be indexed for this field. + * @example Extracting a nested field + * function (doc) { return doc.nested.field } + */ + +/** + * Adds a field to the list of document fields that will be indexed. Every document being + * indexed should have this field. Null values for this field in indexed documents will + * not cause errors but will limit the chance of that document being retrieved by searches. + * + * All fields should be added before adding documents to the index. Adding fields after + * a document has been indexed will have no effect on already indexed documents. + * + * Fields can be boosted at build time. This allows terms within that field to have more + * importance when ranking search results. Use a field boost to specify that matches within + * one field are more important than other fields. + * + * @param {string} fieldName - The name of a field to index in all documents. + * @param {object} attributes - Optional attributes associated with this field. + * @param {number} [attributes.boost=1] - Boost applied to all terms within this field. + * @param {fieldExtractor} [attributes.extractor] - Function to extract a field from a document. + * @throws {RangeError} fieldName cannot contain unsupported characters '/' + */ +lunr.Builder.prototype.field = function (fieldName, attributes) { + if (/\//.test(fieldName)) { + throw new RangeError ("Field '" + fieldName + "' contains illegal character '/'") + } + + this._fields[fieldName] = attributes || {} +} + +/** + * A parameter to tune the amount of field length normalisation that is applied when + * calculating relevance scores. A value of 0 will completely disable any normalisation + * and a value of 1 will fully normalise field lengths. The default is 0.75. Values of b + * will be clamped to the range 0 - 1. + * + * @param {number} number - The value to set for this tuning parameter. + */ +lunr.Builder.prototype.b = function (number) { + if (number < 0) { + this._b = 0 + } else if (number > 1) { + this._b = 1 + } else { + this._b = number + } +} + +/** + * A parameter that controls the speed at which a rise in term frequency results in term + * frequency saturation. The default value is 1.2. Setting this to a higher value will give + * slower saturation levels, a lower value will result in quicker saturation. + * + * @param {number} number - The value to set for this tuning parameter. + */ +lunr.Builder.prototype.k1 = function (number) { + this._k1 = number +} + +/** + * Adds a document to the index. + * + * Before adding fields to the index the index should have been fully setup, with the document + * ref and all fields to index already having been specified. + * + * The document must have a field name as specified by the ref (by default this is 'id') and + * it should have all fields defined for indexing, though null or undefined values will not + * cause errors. + * + * Entire documents can be boosted at build time. Applying a boost to a document indicates that + * this document should rank higher in search results than other documents. + * + * @param {object} doc - The document to add to the index. + * @param {object} attributes - Optional attributes associated with this document. + * @param {number} [attributes.boost=1] - Boost applied to all terms within this document. + */ +lunr.Builder.prototype.add = function (doc, attributes) { + var docRef = doc[this._ref], + fields = Object.keys(this._fields) + + this._documents[docRef] = attributes || {} + this.documentCount += 1 + + for (var i = 0; i < fields.length; i++) { + var fieldName = fields[i], + extractor = this._fields[fieldName].extractor, + field = extractor ? extractor(doc) : doc[fieldName], + tokens = this.tokenizer(field, { + fields: [fieldName] + }), + terms = this.pipeline.run(tokens), + fieldRef = new lunr.FieldRef (docRef, fieldName), + fieldTerms = Object.create(null) + + this.fieldTermFrequencies[fieldRef] = fieldTerms + this.fieldLengths[fieldRef] = 0 + + // store the length of this field for this document + this.fieldLengths[fieldRef] += terms.length + + // calculate term frequencies for this field + for (var j = 0; j < terms.length; j++) { + var term = terms[j] + + if (fieldTerms[term] == undefined) { + fieldTerms[term] = 0 + } + + fieldTerms[term] += 1 + + // add to inverted index + // create an initial posting if one doesn't exist + if (this.invertedIndex[term] == undefined) { + var posting = Object.create(null) + posting["_index"] = this.termIndex + this.termIndex += 1 + + for (var k = 0; k < fields.length; k++) { + posting[fields[k]] = Object.create(null) + } + + this.invertedIndex[term] = posting + } + + // add an entry for this term/fieldName/docRef to the invertedIndex + if (this.invertedIndex[term][fieldName][docRef] == undefined) { + this.invertedIndex[term][fieldName][docRef] = Object.create(null) + } + + // store all whitelisted metadata about this token in the + // inverted index + for (var l = 0; l < this.metadataWhitelist.length; l++) { + var metadataKey = this.metadataWhitelist[l], + metadata = term.metadata[metadataKey] + + if (this.invertedIndex[term][fieldName][docRef][metadataKey] == undefined) { + this.invertedIndex[term][fieldName][docRef][metadataKey] = [] + } + + this.invertedIndex[term][fieldName][docRef][metadataKey].push(metadata) + } + } + + } +} + +/** + * Calculates the average document length for this index + * + * @private + */ +lunr.Builder.prototype.calculateAverageFieldLengths = function () { + + var fieldRefs = Object.keys(this.fieldLengths), + numberOfFields = fieldRefs.length, + accumulator = {}, + documentsWithField = {} + + for (var i = 0; i < numberOfFields; i++) { + var fieldRef = lunr.FieldRef.fromString(fieldRefs[i]), + field = fieldRef.fieldName + + documentsWithField[field] || (documentsWithField[field] = 0) + documentsWithField[field] += 1 + + accumulator[field] || (accumulator[field] = 0) + accumulator[field] += this.fieldLengths[fieldRef] + } + + var fields = Object.keys(this._fields) + + for (var i = 0; i < fields.length; i++) { + var fieldName = fields[i] + accumulator[fieldName] = accumulator[fieldName] / documentsWithField[fieldName] + } + + this.averageFieldLength = accumulator +} + +/** + * Builds a vector space model of every document using lunr.Vector + * + * @private + */ +lunr.Builder.prototype.createFieldVectors = function () { + var fieldVectors = {}, + fieldRefs = Object.keys(this.fieldTermFrequencies), + fieldRefsLength = fieldRefs.length, + termIdfCache = Object.create(null) + + for (var i = 0; i < fieldRefsLength; i++) { + var fieldRef = lunr.FieldRef.fromString(fieldRefs[i]), + fieldName = fieldRef.fieldName, + fieldLength = this.fieldLengths[fieldRef], + fieldVector = new lunr.Vector, + termFrequencies = this.fieldTermFrequencies[fieldRef], + terms = Object.keys(termFrequencies), + termsLength = terms.length + + + var fieldBoost = this._fields[fieldName].boost || 1, + docBoost = this._documents[fieldRef.docRef].boost || 1 + + for (var j = 0; j < termsLength; j++) { + var term = terms[j], + tf = termFrequencies[term], + termIndex = this.invertedIndex[term]._index, + idf, score, scoreWithPrecision + + if (termIdfCache[term] === undefined) { + idf = lunr.idf(this.invertedIndex[term], this.documentCount) + termIdfCache[term] = idf + } else { + idf = termIdfCache[term] + } + + score = idf * ((this._k1 + 1) * tf) / (this._k1 * (1 - this._b + this._b * (fieldLength / this.averageFieldLength[fieldName])) + tf) + score *= fieldBoost + score *= docBoost + scoreWithPrecision = Math.round(score * 1000) / 1000 + // Converts 1.23456789 to 1.234. + // Reducing the precision so that the vectors take up less + // space when serialised. Doing it now so that they behave + // the same before and after serialisation. Also, this is + // the fastest approach to reducing a number's precision in + // JavaScript. + + fieldVector.insert(termIndex, scoreWithPrecision) + } + + fieldVectors[fieldRef] = fieldVector + } + + this.fieldVectors = fieldVectors +} + +/** + * Creates a token set of all tokens in the index using lunr.TokenSet + * + * @private + */ +lunr.Builder.prototype.createTokenSet = function () { + this.tokenSet = lunr.TokenSet.fromArray( + Object.keys(this.invertedIndex).sort() + ) +} + +/** + * Builds the index, creating an instance of lunr.Index. + * + * This completes the indexing process and should only be called + * once all documents have been added to the index. + * + * @returns {lunr.Index} + */ +lunr.Builder.prototype.build = function () { + this.calculateAverageFieldLengths() + this.createFieldVectors() + this.createTokenSet() + + return new lunr.Index({ + invertedIndex: this.invertedIndex, + fieldVectors: this.fieldVectors, + tokenSet: this.tokenSet, + fields: Object.keys(this._fields), + pipeline: this.searchPipeline + }) +} + +/** + * Applies a plugin to the index builder. + * + * A plugin is a function that is called with the index builder as its context. + * Plugins can be used to customise or extend the behaviour of the index + * in some way. A plugin is just a function, that encapsulated the custom + * behaviour that should be applied when building the index. + * + * The plugin function will be called with the index builder as its argument, additional + * arguments can also be passed when calling use. The function will be called + * with the index builder as its context. + * + * @param {Function} plugin The plugin to apply. + */ +lunr.Builder.prototype.use = function (fn) { + var args = Array.prototype.slice.call(arguments, 1) + args.unshift(this) + fn.apply(this, args) +} +/** + * Contains and collects metadata about a matching document. + * A single instance of lunr.MatchData is returned as part of every + * lunr.Index~Result. + * + * @constructor + * @param {string} term - The term this match data is associated with + * @param {string} field - The field in which the term was found + * @param {object} metadata - The metadata recorded about this term in this field + * @property {object} metadata - A cloned collection of metadata associated with this document. + * @see {@link lunr.Index~Result} + */ +lunr.MatchData = function (term, field, metadata) { + var clonedMetadata = Object.create(null), + metadataKeys = Object.keys(metadata || {}) + + // Cloning the metadata to prevent the original + // being mutated during match data combination. + // Metadata is kept in an array within the inverted + // index so cloning the data can be done with + // Array#slice + for (var i = 0; i < metadataKeys.length; i++) { + var key = metadataKeys[i] + clonedMetadata[key] = metadata[key].slice() + } + + this.metadata = Object.create(null) + + if (term !== undefined) { + this.metadata[term] = Object.create(null) + this.metadata[term][field] = clonedMetadata + } +} + +/** + * An instance of lunr.MatchData will be created for every term that matches a + * document. However only one instance is required in a lunr.Index~Result. This + * method combines metadata from another instance of lunr.MatchData with this + * objects metadata. + * + * @param {lunr.MatchData} otherMatchData - Another instance of match data to merge with this one. + * @see {@link lunr.Index~Result} + */ +lunr.MatchData.prototype.combine = function (otherMatchData) { + var terms = Object.keys(otherMatchData.metadata) + + for (var i = 0; i < terms.length; i++) { + var term = terms[i], + fields = Object.keys(otherMatchData.metadata[term]) + + if (this.metadata[term] == undefined) { + this.metadata[term] = Object.create(null) + } + + for (var j = 0; j < fields.length; j++) { + var field = fields[j], + keys = Object.keys(otherMatchData.metadata[term][field]) + + if (this.metadata[term][field] == undefined) { + this.metadata[term][field] = Object.create(null) + } + + for (var k = 0; k < keys.length; k++) { + var key = keys[k] + + if (this.metadata[term][field][key] == undefined) { + this.metadata[term][field][key] = otherMatchData.metadata[term][field][key] + } else { + this.metadata[term][field][key] = this.metadata[term][field][key].concat(otherMatchData.metadata[term][field][key]) + } + + } + } + } +} + +/** + * Add metadata for a term/field pair to this instance of match data. + * + * @param {string} term - The term this match data is associated with + * @param {string} field - The field in which the term was found + * @param {object} metadata - The metadata recorded about this term in this field + */ +lunr.MatchData.prototype.add = function (term, field, metadata) { + if (!(term in this.metadata)) { + this.metadata[term] = Object.create(null) + this.metadata[term][field] = metadata + return + } + + if (!(field in this.metadata[term])) { + this.metadata[term][field] = metadata + return + } + + var metadataKeys = Object.keys(metadata) + + for (var i = 0; i < metadataKeys.length; i++) { + var key = metadataKeys[i] + + if (key in this.metadata[term][field]) { + this.metadata[term][field][key] = this.metadata[term][field][key].concat(metadata[key]) + } else { + this.metadata[term][field][key] = metadata[key] + } + } +} +/** + * A lunr.Query provides a programmatic way of defining queries to be performed + * against a {@link lunr.Index}. + * + * Prefer constructing a lunr.Query using the {@link lunr.Index#query} method + * so the query object is pre-initialized with the right index fields. + * + * @constructor + * @property {lunr.Query~Clause[]} clauses - An array of query clauses. + * @property {string[]} allFields - An array of all available fields in a lunr.Index. + */ +lunr.Query = function (allFields) { + this.clauses = [] + this.allFields = allFields +} + +/** + * Constants for indicating what kind of automatic wildcard insertion will be used when constructing a query clause. + * + * This allows wildcards to be added to the beginning and end of a term without having to manually do any string + * concatenation. + * + * The wildcard constants can be bitwise combined to select both leading and trailing wildcards. + * + * @constant + * @default + * @property {number} wildcard.NONE - The term will have no wildcards inserted, this is the default behaviour + * @property {number} wildcard.LEADING - Prepend the term with a wildcard, unless a leading wildcard already exists + * @property {number} wildcard.TRAILING - Append a wildcard to the term, unless a trailing wildcard already exists + * @see lunr.Query~Clause + * @see lunr.Query#clause + * @see lunr.Query#term + * @example query term with trailing wildcard + * query.term('foo', { wildcard: lunr.Query.wildcard.TRAILING }) + * @example query term with leading and trailing wildcard + * query.term('foo', { + * wildcard: lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING + * }) + */ + +lunr.Query.wildcard = new String ("*") +lunr.Query.wildcard.NONE = 0 +lunr.Query.wildcard.LEADING = 1 +lunr.Query.wildcard.TRAILING = 2 + +/** + * Constants for indicating what kind of presence a term must have in matching documents. + * + * @constant + * @enum {number} + * @see lunr.Query~Clause + * @see lunr.Query#clause + * @see lunr.Query#term + * @example query term with required presence + * query.term('foo', { presence: lunr.Query.presence.REQUIRED }) + */ +lunr.Query.presence = { + /** + * Term's presence in a document is optional, this is the default value. + */ + OPTIONAL: 1, + + /** + * Term's presence in a document is required, documents that do not contain + * this term will not be returned. + */ + REQUIRED: 2, + + /** + * Term's presence in a document is prohibited, documents that do contain + * this term will not be returned. + */ + PROHIBITED: 3 +} + +/** + * A single clause in a {@link lunr.Query} contains a term and details on how to + * match that term against a {@link lunr.Index}. + * + * @typedef {Object} lunr.Query~Clause + * @property {string[]} fields - The fields in an index this clause should be matched against. + * @property {number} [boost=1] - Any boost that should be applied when matching this clause. + * @property {number} [editDistance] - Whether the term should have fuzzy matching applied, and how fuzzy the match should be. + * @property {boolean} [usePipeline] - Whether the term should be passed through the search pipeline. + * @property {number} [wildcard=lunr.Query.wildcard.NONE] - Whether the term should have wildcards appended or prepended. + * @property {number} [presence=lunr.Query.presence.OPTIONAL] - The terms presence in any matching documents. + */ + +/** + * Adds a {@link lunr.Query~Clause} to this query. + * + * Unless the clause contains the fields to be matched all fields will be matched. In addition + * a default boost of 1 is applied to the clause. + * + * @param {lunr.Query~Clause} clause - The clause to add to this query. + * @see lunr.Query~Clause + * @returns {lunr.Query} + */ +lunr.Query.prototype.clause = function (clause) { + if (!('fields' in clause)) { + clause.fields = this.allFields + } + + if (!('boost' in clause)) { + clause.boost = 1 + } + + if (!('usePipeline' in clause)) { + clause.usePipeline = true + } + + if (!('wildcard' in clause)) { + clause.wildcard = lunr.Query.wildcard.NONE + } + + if ((clause.wildcard & lunr.Query.wildcard.LEADING) && (clause.term.charAt(0) != lunr.Query.wildcard)) { + clause.term = "*" + clause.term + } + + if ((clause.wildcard & lunr.Query.wildcard.TRAILING) && (clause.term.slice(-1) != lunr.Query.wildcard)) { + clause.term = "" + clause.term + "*" + } + + if (!('presence' in clause)) { + clause.presence = lunr.Query.presence.OPTIONAL + } + + this.clauses.push(clause) + + return this +} + +/** + * A negated query is one in which every clause has a presence of + * prohibited. These queries require some special processing to return + * the expected results. + * + * @returns boolean + */ +lunr.Query.prototype.isNegated = function () { + for (var i = 0; i < this.clauses.length; i++) { + if (this.clauses[i].presence != lunr.Query.presence.PROHIBITED) { + return false + } + } + + return true +} + +/** + * Adds a term to the current query, under the covers this will create a {@link lunr.Query~Clause} + * to the list of clauses that make up this query. + * + * The term is used as is, i.e. no tokenization will be performed by this method. Instead conversion + * to a token or token-like string should be done before calling this method. + * + * The term will be converted to a string by calling `toString`. Multiple terms can be passed as an + * array, each term in the array will share the same options. + * + * @param {object|object[]} term - The term(s) to add to the query. + * @param {object} [options] - Any additional properties to add to the query clause. + * @returns {lunr.Query} + * @see lunr.Query#clause + * @see lunr.Query~Clause + * @example adding a single term to a query + * query.term("foo") + * @example adding a single term to a query and specifying search fields, term boost and automatic trailing wildcard + * query.term("foo", { + * fields: ["title"], + * boost: 10, + * wildcard: lunr.Query.wildcard.TRAILING + * }) + * @example using lunr.tokenizer to convert a string to tokens before using them as terms + * query.term(lunr.tokenizer("foo bar")) + */ +lunr.Query.prototype.term = function (term, options) { + if (Array.isArray(term)) { + term.forEach(function (t) { this.term(t, lunr.utils.clone(options)) }, this) + return this + } + + var clause = options || {} + clause.term = term.toString() + + this.clause(clause) + + return this +} +lunr.QueryParseError = function (message, start, end) { + this.name = "QueryParseError" + this.message = message + this.start = start + this.end = end +} + +lunr.QueryParseError.prototype = new Error +lunr.QueryLexer = function (str) { + this.lexemes = [] + this.str = str + this.length = str.length + this.pos = 0 + this.start = 0 + this.escapeCharPositions = [] +} + +lunr.QueryLexer.prototype.run = function () { + var state = lunr.QueryLexer.lexText + + while (state) { + state = state(this) + } +} + +lunr.QueryLexer.prototype.sliceString = function () { + var subSlices = [], + sliceStart = this.start, + sliceEnd = this.pos + + for (var i = 0; i < this.escapeCharPositions.length; i++) { + sliceEnd = this.escapeCharPositions[i] + subSlices.push(this.str.slice(sliceStart, sliceEnd)) + sliceStart = sliceEnd + 1 + } + + subSlices.push(this.str.slice(sliceStart, this.pos)) + this.escapeCharPositions.length = 0 + + return subSlices.join('') +} + +lunr.QueryLexer.prototype.emit = function (type) { + this.lexemes.push({ + type: type, + str: this.sliceString(), + start: this.start, + end: this.pos + }) + + this.start = this.pos +} + +lunr.QueryLexer.prototype.escapeCharacter = function () { + this.escapeCharPositions.push(this.pos - 1) + this.pos += 1 +} + +lunr.QueryLexer.prototype.next = function () { + if (this.pos >= this.length) { + return lunr.QueryLexer.EOS + } + + var char = this.str.charAt(this.pos) + this.pos += 1 + return char +} + +lunr.QueryLexer.prototype.width = function () { + return this.pos - this.start +} + +lunr.QueryLexer.prototype.ignore = function () { + if (this.start == this.pos) { + this.pos += 1 + } + + this.start = this.pos +} + +lunr.QueryLexer.prototype.backup = function () { + this.pos -= 1 +} + +lunr.QueryLexer.prototype.acceptDigitRun = function () { + var char, charCode + + do { + char = this.next() + charCode = char.charCodeAt(0) + } while (charCode > 47 && charCode < 58) + + if (char != lunr.QueryLexer.EOS) { + this.backup() + } +} + +lunr.QueryLexer.prototype.more = function () { + return this.pos < this.length +} + +lunr.QueryLexer.EOS = 'EOS' +lunr.QueryLexer.FIELD = 'FIELD' +lunr.QueryLexer.TERM = 'TERM' +lunr.QueryLexer.EDIT_DISTANCE = 'EDIT_DISTANCE' +lunr.QueryLexer.BOOST = 'BOOST' +lunr.QueryLexer.PRESENCE = 'PRESENCE' + +lunr.QueryLexer.lexField = function (lexer) { + lexer.backup() + lexer.emit(lunr.QueryLexer.FIELD) + lexer.ignore() + return lunr.QueryLexer.lexText +} + +lunr.QueryLexer.lexTerm = function (lexer) { + if (lexer.width() > 1) { + lexer.backup() + lexer.emit(lunr.QueryLexer.TERM) + } + + lexer.ignore() + + if (lexer.more()) { + return lunr.QueryLexer.lexText + } +} + +lunr.QueryLexer.lexEditDistance = function (lexer) { + lexer.ignore() + lexer.acceptDigitRun() + lexer.emit(lunr.QueryLexer.EDIT_DISTANCE) + return lunr.QueryLexer.lexText +} + +lunr.QueryLexer.lexBoost = function (lexer) { + lexer.ignore() + lexer.acceptDigitRun() + lexer.emit(lunr.QueryLexer.BOOST) + return lunr.QueryLexer.lexText +} + +lunr.QueryLexer.lexEOS = function (lexer) { + if (lexer.width() > 0) { + lexer.emit(lunr.QueryLexer.TERM) + } +} + +// This matches the separator used when tokenising fields +// within a document. These should match otherwise it is +// not possible to search for some tokens within a document. +// +// It is possible for the user to change the separator on the +// tokenizer so it _might_ clash with any other of the special +// characters already used within the search string, e.g. :. +// +// This means that it is possible to change the separator in +// such a way that makes some words unsearchable using a search +// string. +lunr.QueryLexer.termSeparator = lunr.tokenizer.separator + +lunr.QueryLexer.lexText = function (lexer) { + while (true) { + var char = lexer.next() + + if (char == lunr.QueryLexer.EOS) { + return lunr.QueryLexer.lexEOS + } + + // Escape character is '\' + if (char.charCodeAt(0) == 92) { + lexer.escapeCharacter() + continue + } + + if (char == ":") { + return lunr.QueryLexer.lexField + } + + if (char == "~") { + lexer.backup() + if (lexer.width() > 0) { + lexer.emit(lunr.QueryLexer.TERM) + } + return lunr.QueryLexer.lexEditDistance + } + + if (char == "^") { + lexer.backup() + if (lexer.width() > 0) { + lexer.emit(lunr.QueryLexer.TERM) + } + return lunr.QueryLexer.lexBoost + } + + // "+" indicates term presence is required + // checking for length to ensure that only + // leading "+" are considered + if (char == "+" && lexer.width() === 1) { + lexer.emit(lunr.QueryLexer.PRESENCE) + return lunr.QueryLexer.lexText + } + + // "-" indicates term presence is prohibited + // checking for length to ensure that only + // leading "-" are considered + if (char == "-" && lexer.width() === 1) { + lexer.emit(lunr.QueryLexer.PRESENCE) + return lunr.QueryLexer.lexText + } + + if (char.match(lunr.QueryLexer.termSeparator)) { + return lunr.QueryLexer.lexTerm + } + } +} + +lunr.QueryParser = function (str, query) { + this.lexer = new lunr.QueryLexer (str) + this.query = query + this.currentClause = {} + this.lexemeIdx = 0 +} + +lunr.QueryParser.prototype.parse = function () { + this.lexer.run() + this.lexemes = this.lexer.lexemes + + var state = lunr.QueryParser.parseClause + + while (state) { + state = state(this) + } + + return this.query +} + +lunr.QueryParser.prototype.peekLexeme = function () { + return this.lexemes[this.lexemeIdx] +} + +lunr.QueryParser.prototype.consumeLexeme = function () { + var lexeme = this.peekLexeme() + this.lexemeIdx += 1 + return lexeme +} + +lunr.QueryParser.prototype.nextClause = function () { + var completedClause = this.currentClause + this.query.clause(completedClause) + this.currentClause = {} +} + +lunr.QueryParser.parseClause = function (parser) { + var lexeme = parser.peekLexeme() + + if (lexeme == undefined) { + return + } + + switch (lexeme.type) { + case lunr.QueryLexer.PRESENCE: + return lunr.QueryParser.parsePresence + case lunr.QueryLexer.FIELD: + return lunr.QueryParser.parseField + case lunr.QueryLexer.TERM: + return lunr.QueryParser.parseTerm + default: + var errorMessage = "expected either a field or a term, found " + lexeme.type + + if (lexeme.str.length >= 1) { + errorMessage += " with value '" + lexeme.str + "'" + } + + throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end) + } +} + +lunr.QueryParser.parsePresence = function (parser) { + var lexeme = parser.consumeLexeme() + + if (lexeme == undefined) { + return + } + + switch (lexeme.str) { + case "-": + parser.currentClause.presence = lunr.Query.presence.PROHIBITED + break + case "+": + parser.currentClause.presence = lunr.Query.presence.REQUIRED + break + default: + var errorMessage = "unrecognised presence operator'" + lexeme.str + "'" + throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end) + } + + var nextLexeme = parser.peekLexeme() + + if (nextLexeme == undefined) { + var errorMessage = "expecting term or field, found nothing" + throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end) + } + + switch (nextLexeme.type) { + case lunr.QueryLexer.FIELD: + return lunr.QueryParser.parseField + case lunr.QueryLexer.TERM: + return lunr.QueryParser.parseTerm + default: + var errorMessage = "expecting term or field, found '" + nextLexeme.type + "'" + throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end) + } +} + +lunr.QueryParser.parseField = function (parser) { + var lexeme = parser.consumeLexeme() + + if (lexeme == undefined) { + return + } + + if (parser.query.allFields.indexOf(lexeme.str) == -1) { + var possibleFields = parser.query.allFields.map(function (f) { return "'" + f + "'" }).join(', '), + errorMessage = "unrecognised field '" + lexeme.str + "', possible fields: " + possibleFields + + throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end) + } + + parser.currentClause.fields = [lexeme.str] + + var nextLexeme = parser.peekLexeme() + + if (nextLexeme == undefined) { + var errorMessage = "expecting term, found nothing" + throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end) + } + + switch (nextLexeme.type) { + case lunr.QueryLexer.TERM: + return lunr.QueryParser.parseTerm + default: + var errorMessage = "expecting term, found '" + nextLexeme.type + "'" + throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end) + } +} + +lunr.QueryParser.parseTerm = function (parser) { + var lexeme = parser.consumeLexeme() + + if (lexeme == undefined) { + return + } + + parser.currentClause.term = lexeme.str.toLowerCase() + + if (lexeme.str.indexOf("*") != -1) { + parser.currentClause.usePipeline = false + } + + var nextLexeme = parser.peekLexeme() + + if (nextLexeme == undefined) { + parser.nextClause() + return + } + + switch (nextLexeme.type) { + case lunr.QueryLexer.TERM: + parser.nextClause() + return lunr.QueryParser.parseTerm + case lunr.QueryLexer.FIELD: + parser.nextClause() + return lunr.QueryParser.parseField + case lunr.QueryLexer.EDIT_DISTANCE: + return lunr.QueryParser.parseEditDistance + case lunr.QueryLexer.BOOST: + return lunr.QueryParser.parseBoost + case lunr.QueryLexer.PRESENCE: + parser.nextClause() + return lunr.QueryParser.parsePresence + default: + var errorMessage = "Unexpected lexeme type '" + nextLexeme.type + "'" + throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end) + } +} + +lunr.QueryParser.parseEditDistance = function (parser) { + var lexeme = parser.consumeLexeme() + + if (lexeme == undefined) { + return + } + + var editDistance = parseInt(lexeme.str, 10) + + if (isNaN(editDistance)) { + var errorMessage = "edit distance must be numeric" + throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end) + } + + parser.currentClause.editDistance = editDistance + + var nextLexeme = parser.peekLexeme() + + if (nextLexeme == undefined) { + parser.nextClause() + return + } + + switch (nextLexeme.type) { + case lunr.QueryLexer.TERM: + parser.nextClause() + return lunr.QueryParser.parseTerm + case lunr.QueryLexer.FIELD: + parser.nextClause() + return lunr.QueryParser.parseField + case lunr.QueryLexer.EDIT_DISTANCE: + return lunr.QueryParser.parseEditDistance + case lunr.QueryLexer.BOOST: + return lunr.QueryParser.parseBoost + case lunr.QueryLexer.PRESENCE: + parser.nextClause() + return lunr.QueryParser.parsePresence + default: + var errorMessage = "Unexpected lexeme type '" + nextLexeme.type + "'" + throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end) + } +} + +lunr.QueryParser.parseBoost = function (parser) { + var lexeme = parser.consumeLexeme() + + if (lexeme == undefined) { + return + } + + var boost = parseInt(lexeme.str, 10) + + if (isNaN(boost)) { + var errorMessage = "boost must be numeric" + throw new lunr.QueryParseError (errorMessage, lexeme.start, lexeme.end) + } + + parser.currentClause.boost = boost + + var nextLexeme = parser.peekLexeme() + + if (nextLexeme == undefined) { + parser.nextClause() + return + } + + switch (nextLexeme.type) { + case lunr.QueryLexer.TERM: + parser.nextClause() + return lunr.QueryParser.parseTerm + case lunr.QueryLexer.FIELD: + parser.nextClause() + return lunr.QueryParser.parseField + case lunr.QueryLexer.EDIT_DISTANCE: + return lunr.QueryParser.parseEditDistance + case lunr.QueryLexer.BOOST: + return lunr.QueryParser.parseBoost + case lunr.QueryLexer.PRESENCE: + parser.nextClause() + return lunr.QueryParser.parsePresence + default: + var errorMessage = "Unexpected lexeme type '" + nextLexeme.type + "'" + throw new lunr.QueryParseError (errorMessage, nextLexeme.start, nextLexeme.end) + } +} + + /** + * export the module via AMD, CommonJS or as a browser global + * Export code from https://github.com/umdjs/umd/blob/master/returnExports.js + */ + ;(function (root, factory) { + if (typeof define === 'function' && define.amd) { + // AMD. Register as an anonymous module. + define(factory) + } else if (typeof exports === 'object') { + /** + * Node. Does not work with strict CommonJS, but + * only CommonJS-like environments that support module.exports, + * like Node. + */ + module.exports = factory() + } else { + // Browser globals (root is window) + root.lunr = factory() + } + }(this, function () { + /** + * Just return a value to define the module export. + * This example returns an object, but the module + * can return a function as the exported value. + */ + return lunr + })) +})(); diff --git a/search/main.js b/search/main.js new file mode 100644 index 00000000..a5e469d7 --- /dev/null +++ b/search/main.js @@ -0,0 +1,109 @@ +function getSearchTermFromLocation() { + var sPageURL = window.location.search.substring(1); + var sURLVariables = sPageURL.split('&'); + for (var i = 0; i < sURLVariables.length; i++) { + var sParameterName = sURLVariables[i].split('='); + if (sParameterName[0] == 'q') { + return decodeURIComponent(sParameterName[1].replace(/\+/g, '%20')); + } + } +} + +function joinUrl (base, path) { + if (path.substring(0, 1) === "/") { + // path starts with `/`. Thus it is absolute. + return path; + } + if (base.substring(base.length-1) === "/") { + // base ends with `/` + return base + path; + } + return base + "/" + path; +} + +function escapeHtml (value) { + return value.replace(/&/g, '&') + .replace(/"/g, '"') + .replace(//g, '>'); +} + +function formatResult (location, title, summary) { + return ''; +} + +function displayResults (results) { + var search_results = document.getElementById("mkdocs-search-results"); + while (search_results.firstChild) { + search_results.removeChild(search_results.firstChild); + } + if (results.length > 0){ + for (var i=0; i < results.length; i++){ + var result = results[i]; + var html = formatResult(result.location, result.title, result.summary); + search_results.insertAdjacentHTML('beforeend', html); + } + } else { + var noResultsText = search_results.getAttribute('data-no-results-text'); + if (!noResultsText) { + noResultsText = "No results found"; + } + search_results.insertAdjacentHTML('beforeend', '

' + noResultsText + '

'); + } +} + +function doSearch () { + var query = document.getElementById('mkdocs-search-query').value; + if (query.length > min_search_length) { + if (!window.Worker) { + displayResults(search(query)); + } else { + searchWorker.postMessage({query: query}); + } + } else { + // Clear results for short queries + displayResults([]); + } +} + +function initSearch () { + var search_input = document.getElementById('mkdocs-search-query'); + if (search_input) { + search_input.addEventListener("keyup", doSearch); + } + var term = getSearchTermFromLocation(); + if (term) { + search_input.value = term; + doSearch(); + } +} + +function onWorkerMessage (e) { + if (e.data.allowSearch) { + initSearch(); + } else if (e.data.results) { + var results = e.data.results; + displayResults(results); + } else if (e.data.config) { + min_search_length = e.data.config.min_search_length-1; + } +} + +if (!window.Worker) { + console.log('Web Worker API not supported'); + // load index in main thread + $.getScript(joinUrl(base_url, "search/worker.js")).done(function () { + console.log('Loaded worker'); + init(); + window.postMessage = function (msg) { + onWorkerMessage({data: msg}); + }; + }).fail(function (jqxhr, settings, exception) { + console.error('Could not load worker.js'); + }); +} else { + // Wrap search in a web worker + var searchWorker = new Worker(joinUrl(base_url, "search/worker.js")); + searchWorker.postMessage({init: true}); + searchWorker.onmessage = onWorkerMessage; +} diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 00000000..05363430 --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to UFO's Document! \u2002 \u2002 \u2002 \u2002 \u2002 Introduction UFO is a UI-Focused multi-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications. \ud83d\udd4c Framework UFO operates as a multi-agent framework, encompassing: HostAgent \ud83e\udd16 , tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application. AppAgent \ud83d\udc7e , responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. Application Automator \ud83c\udfae , is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details here . Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report . \ud83d\ude80 Quick Start Please follow the Quick Start Guide to get started with UFO. \ud83d\udca5 Highlights First Windows Agent - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS. Agent as an Expert - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including offline help documents, online search engines, and human demonstrations, making the agent an application \"expert\". Rich Skill Set - UFO is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native API, and \"Copilot\". Interactive Mode - UFO facilitates multiple sub-requests from users within the same session, enabling the seamless completion of complex tasks. Agent Customization - UFO allows users to customize their own agents by providing additional information. The agent will proactively query users for details when necessary to better tailor its behavior. Scalable AppAgent Creation - UFO offers extensibility, allowing users and app developers to create their own AppAgents in an easy and scalable way. \ud83c\udf10 Media Coverage Check out our official deep dive of UFO on this Youtube Video . UFO sightings have garnered attention from various media outlets, including: Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience \ud83d\ude80 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo\ud83c\udf0c The AI PC - The Future of Computers? - Microsoft UFO \u4e0b\u4e00\u4ee3Windows\u7cfb\u7edf\u66dd\u5149\uff1a\u57fa\u4e8eGPT-4V\uff0cAgent\u8de8\u5e94\u7528\u8c03\u5ea6\uff0c\u4ee3\u53f7UFO \u4e0b\u4e00\u4ee3\u667a\u80fd\u7248 Windows \u8981\u6765\u4e86\uff1f\u5fae\u8f6f\u63a8\u51fa\u9996\u4e2a Windows Agent\uff0c\u547d\u540d\u4e3a UFO\uff01 Microsoft\u767a\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u7248\u300cUFO\u300d\u767b\u5834\uff01\u3000Windows\u3092\u81ea\u52d5\u64cd\u7e26\u3059\u308bAI\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u8a66\u3059 \u2753Get help \u2754GitHub Issues (prefered) For other communications, please contact ufo-agent@microsoft.com \ud83d\udcda Citation Our technical report paper can be found here . Note that previous HostAgent and AppAgent in the paper are renamed to HostAgent and AppAgent in the code base to better reflect their functions. If you use UFO in your research, please cite our paper: @article{ufo, title={{UFO: A UI-Focused Agent for Windows OS Interaction}}, author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi}, journal={arXiv preprint arXiv:2402.07939}, year={2024} } \ud83c\udfa8 Related Projects If you're interested in data analytics agent frameworks, check out TaskWeaver , a code-first LLM agent framework designed for seamlessly planning and executing data analytics tasks. For more information on GUI agents, refer to our survey paper: Large Language Model-Brained GUI Agents: A Survey . You can also explore the survey through: - GitHub Repository - Searchable Website window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-FX17ZGJYGC');","title":"Home"},{"location":"#welcome-to-ufos-document","text":"","title":"Welcome to UFO's Document!"},{"location":"#introduction","text":"UFO is a UI-Focused multi-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.","title":"Introduction"},{"location":"#framework","text":"UFO operates as a multi-agent framework, encompassing: HostAgent \ud83e\udd16 , tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application. AppAgent \ud83d\udc7e , responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. Application Automator \ud83c\udfae , is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details here . Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report .","title":"\ud83d\udd4c Framework"},{"location":"#quick-start","text":"Please follow the Quick Start Guide to get started with UFO.","title":"\ud83d\ude80 Quick Start"},{"location":"#highlights","text":"First Windows Agent - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS. Agent as an Expert - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including offline help documents, online search engines, and human demonstrations, making the agent an application \"expert\". Rich Skill Set - UFO is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native API, and \"Copilot\". Interactive Mode - UFO facilitates multiple sub-requests from users within the same session, enabling the seamless completion of complex tasks. Agent Customization - UFO allows users to customize their own agents by providing additional information. The agent will proactively query users for details when necessary to better tailor its behavior. Scalable AppAgent Creation - UFO offers extensibility, allowing users and app developers to create their own AppAgents in an easy and scalable way.","title":"\ud83d\udca5 Highlights"},{"location":"#media-coverage","text":"Check out our official deep dive of UFO on this Youtube Video . UFO sightings have garnered attention from various media outlets, including: Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience \ud83d\ude80 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo\ud83c\udf0c The AI PC - The Future of Computers? - Microsoft UFO \u4e0b\u4e00\u4ee3Windows\u7cfb\u7edf\u66dd\u5149\uff1a\u57fa\u4e8eGPT-4V\uff0cAgent\u8de8\u5e94\u7528\u8c03\u5ea6\uff0c\u4ee3\u53f7UFO \u4e0b\u4e00\u4ee3\u667a\u80fd\u7248 Windows \u8981\u6765\u4e86\uff1f\u5fae\u8f6f\u63a8\u51fa\u9996\u4e2a Windows Agent\uff0c\u547d\u540d\u4e3a UFO\uff01 Microsoft\u767a\u306e\u30aa\u30fc\u30d7\u30f3\u30bd\u30fc\u30b9\u7248\u300cUFO\u300d\u767b\u5834\uff01\u3000Windows\u3092\u81ea\u52d5\u64cd\u7e26\u3059\u308bAI\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u8a66\u3059","title":"\ud83c\udf10 Media Coverage"},{"location":"#get-help","text":"\u2754GitHub Issues (prefered) For other communications, please contact ufo-agent@microsoft.com","title":"\u2753Get help"},{"location":"#citation","text":"Our technical report paper can be found here . Note that previous HostAgent and AppAgent in the paper are renamed to HostAgent and AppAgent in the code base to better reflect their functions. If you use UFO in your research, please cite our paper: @article{ufo, title={{UFO: A UI-Focused Agent for Windows OS Interaction}}, author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi}, journal={arXiv preprint arXiv:2402.07939}, year={2024} }","title":"\ud83d\udcda Citation"},{"location":"#related-projects","text":"If you're interested in data analytics agent frameworks, check out TaskWeaver , a code-first LLM agent framework designed for seamlessly planning and executing data analytics tasks. For more information on GUI agents, refer to our survey paper: Large Language Model-Brained GUI Agents: A Survey . You can also explore the survey through: - GitHub Repository - Searchable Website window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-FX17ZGJYGC');","title":"\ud83c\udfa8 Related Projects"},{"location":"faq/","text":"FAQ We provide answers to some frequently asked questions about the UFO. Q1: Why is it called UFO? A: UFO stands for U I Fo cused agent. The name is inspired by the concept of an unidentified flying object (UFO) that is mysterious and futuristic. Q2: Can I use UFO on Linux or macOS? A: UFO is currently only supported on Windows OS. Q3: Why the latency of UFO is high? A: The latency of UFO depends on the response time of the LLMs and the network speed. If you are using GPT, it usually takes dozens of seconds to generate a response in one step. The workload of the GPT endpoint may also affect the latency. Q4: What models does UFO support? A: UFO supports various language models, including OpenAI and Azure OpenAI models, QWEN, google Gimini, Ollama, and more. You can find the full list of supported models in the Supported Models section of the documentation. Q5: Can I use non-vision models in UFO? A: Yes, you can use non-vision models in UFO. You can set the VISUAL_MODE to False in the config.yaml file to disable the visual mode and use non-vision models. However, UFO is designed to work with vision models, and using non-vision models may affect the performance. Q6: Can I host my own LLM endpoint? A: Yes, you can host your custom LLM endpoint and configure UFO to use it. Check the documentation in the Supported Models section for more details. Q7: Can I use non-English requests in UFO? A: It depends on the language model you are using. Most of LLMs support multiple languages, and you can specify the language in the request. However, the performance may vary for different languages. Q8: Why it shows the error Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) ? A: This means the LLM endpoint is not accessible. You can check the network connection (e.g. VPN) and the status of the LLM endpoint. Info To get more support, please submit an issue on the GitHub Issues , or send an email to ufo-agent@microsoft.com .","title":"FAQ"},{"location":"faq/#faq","text":"We provide answers to some frequently asked questions about the UFO.","title":"FAQ"},{"location":"faq/#q1-why-is-it-called-ufo","text":"A: UFO stands for U I Fo cused agent. The name is inspired by the concept of an unidentified flying object (UFO) that is mysterious and futuristic.","title":"Q1: Why is it called UFO?"},{"location":"faq/#q2-can-i-use-ufo-on-linux-or-macos","text":"A: UFO is currently only supported on Windows OS.","title":"Q2: Can I use UFO on Linux or macOS?"},{"location":"faq/#q3-why-the-latency-of-ufo-is-high","text":"A: The latency of UFO depends on the response time of the LLMs and the network speed. If you are using GPT, it usually takes dozens of seconds to generate a response in one step. The workload of the GPT endpoint may also affect the latency.","title":"Q3: Why the latency of UFO is high?"},{"location":"faq/#q4-what-models-does-ufo-support","text":"A: UFO supports various language models, including OpenAI and Azure OpenAI models, QWEN, google Gimini, Ollama, and more. You can find the full list of supported models in the Supported Models section of the documentation.","title":"Q4: What models does UFO support?"},{"location":"faq/#q5-can-i-use-non-vision-models-in-ufo","text":"A: Yes, you can use non-vision models in UFO. You can set the VISUAL_MODE to False in the config.yaml file to disable the visual mode and use non-vision models. However, UFO is designed to work with vision models, and using non-vision models may affect the performance.","title":"Q5: Can I use non-vision models in UFO?"},{"location":"faq/#q6-can-i-host-my-own-llm-endpoint","text":"A: Yes, you can host your custom LLM endpoint and configure UFO to use it. Check the documentation in the Supported Models section for more details.","title":"Q6: Can I host my own LLM endpoint?"},{"location":"faq/#q7-can-i-use-non-english-requests-in-ufo","text":"A: It depends on the language model you are using. Most of LLMs support multiple languages, and you can specify the language in the request. However, the performance may vary for different languages.","title":"Q7: Can I use non-English requests in UFO?"},{"location":"faq/#q8-why-it-shows-the-error-error-making-api-request-connection-aborted-remotedisconnectedremote-end-closed-connection-without-response","text":"A: This means the LLM endpoint is not accessible. You can check the network connection (e.g. VPN) and the status of the LLM endpoint. Info To get more support, please submit an issue on the GitHub Issues , or send an email to ufo-agent@microsoft.com .","title":"Q8: Why it shows the error Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))?"},{"location":"project_directory_structure/","text":"The UFO project is organized into a well-defined directory structure to facilitate development, deployment, and documentation. Below is an overview of each directory and file, along with their purpose: \ud83d\udce6project \u2523 \ud83d\udcc2documents # Folder to store project documentation \u2523 \ud83d\udcc2learner # Folder to build the vector database for help documents \u2523 \ud83d\udcc2model_worker # Folder to store tools for deploying your own model \u2523 \ud83d\udcc2record_processor # Folder to parse human demonstrations from Windows Step Recorder and build the vector database \u2523 \ud83d\udcc2vetordb # Folder to store all data in the vector database for RAG (Retrieval-Augmented Generation) \u2523 \ud83d\udcc2logs # Folder to store logs, generated after the program starts \u2517 \ud83d\udcc2ufo # Directory containing main project code \u2523 \ud83d\udcc2module # Directory for the basic module of UFO, e.g., session and round \u2523 \ud83d\udcc2agents # Code implementation of agents in UFO \u2523 \ud83d\udcc2automator # Implementation of the skill set of agents to automate applications \u2523 \ud83d\udcc2experience # Parse and save the agent's self-experience \u2523 \ud83d\udcc2llm # Folder to store the LLM (Large Language Model) implementation \u2523 \ud83d\udcc2prompter # Prompt constructor for the agent \u2523 \ud83d\udcc2prompts # Prompt templates and files to construct the full prompt \u2523 \ud83d\udcc2rag # Implementation of RAG from different sources to enhance agents' abilities \u2523 \ud83d\udcc2utils # Utility functions \u2523 \ud83d\udcc2config # Configuration files \u2523 \ud83d\udcdcconfig.yaml # User configuration file for LLM and other settings \u2523 \ud83d\udcdcconfig_dev.yaml # Configuration file for developers \u2517 ... \u2517 \ud83d\udcc4ufo.py # Main entry point for the UFO client Directory and File Descriptions documents Purpose: Stores all the project documentation. Details: This may include design documents, user manuals, API documentation, and any other relevant project documentation. learner Purpose: Used to build the vector database for help documents. Details: This directory contains scripts and tools to process help documents and create a searchable vector database, enhancing the agents' ability for task completion. model_worker Purpose: Contains tools and scripts necessary for deploying custom models. Details: This includes model deployment configurations, and management tools for integrating custom models into the project. record_processor Purpose: Parses human demonstrations recorded using the Windows Step Recorder and builds the vector database. Details: This directory includes parsers, data processing scripts, and tools to convert human demonstrations into a format suitable for agent's retrieval. vetordb Purpose: Stores all data within the vector database for Retrieval-Augmented Generation (RAG). Details: This directory is essential for maintaining the data that enhances the agents' ability to retrieve relevant information and generate more accurate responses. logs Purpose: Stores log files generated by the application. Details: This directory helps in monitoring, debugging, and analyzing the application's performance and behavior. Logs are generated dynamically as the application runs. ufo Purpose: The core directory containing the main project code. Details: This directory is further subdivided into multiple subdirectories, each serving a specific purpose within the project. module Purpose: Contains the basic modules of the UFO project, such as session management and rounds. Details: This includes foundational classes and functions that are used throughout the project. agents Purpose: Houses the code implementations of various agents in the UFO project. Details: Agents are components that perform specific tasks within the system, and this directory contains their logic, components, and behavior. automator Purpose: Implements the skill set of agents to automate applications. Details: This includes scripts and tools that enable agents to interact with and automate tasks in various applications, such as mouse and keyboard actions and API calls. experience Purpose: Parses and saves the agent's self-experience. Details: This directory contains mechanisms for agents to learn from their actions and outcomes, improving their performance over time. llm Purpose: Stores the implementation of the Large Language Model (LLM). Details: This includes the implementation of APIs for different language models, such as GPT, Genimi, QWEN, etc., that are used by the agents. prompter Purpose: Constructs prompts for the agents. Details: This directory includes prompt construction logic and tools that help agents generate meaningful prompts for user interactions. prompts Purpose: Contains prompt templates and files used to construct the full prompt. Details: This includes predefined prompt structures and content that are used to create meaningful interactions with the agents. rag Purpose: Implements Retrieval-Augmented Generation (RAG) from different sources to enhance the agents' abilities. etails: This directory includes scripts and tools for integrating various data sources into the RAG framework, improving the accuracy and relevance of the agents' outputs. utils Purpose: Contains utility functions. Details: This directory includes helper functions, common utilities, and other reusable code snippets that support the project's operations. config Purpose: Stores configuration files. Details: This directory includes different configuration files for various environments and purposes. config.yaml: User configuration file for LLM and other settings. You need to rename config.yaml.template to config.yaml and edit the configuration settings as needed. config_dev.yaml : Developer-specific configuration file with settings tailored for development purposes. ufo.py Purpose: Main entry point for the UFO client. Details: This script initializes and starts the UFO application.","title":"Project Directory Structure"},{"location":"project_directory_structure/#directory-and-file-descriptions","text":"","title":"Directory and File Descriptions"},{"location":"project_directory_structure/#documents","text":"Purpose: Stores all the project documentation. Details: This may include design documents, user manuals, API documentation, and any other relevant project documentation.","title":"documents"},{"location":"project_directory_structure/#learner","text":"Purpose: Used to build the vector database for help documents. Details: This directory contains scripts and tools to process help documents and create a searchable vector database, enhancing the agents' ability for task completion.","title":"learner"},{"location":"project_directory_structure/#model_worker","text":"Purpose: Contains tools and scripts necessary for deploying custom models. Details: This includes model deployment configurations, and management tools for integrating custom models into the project.","title":"model_worker"},{"location":"project_directory_structure/#record_processor","text":"Purpose: Parses human demonstrations recorded using the Windows Step Recorder and builds the vector database. Details: This directory includes parsers, data processing scripts, and tools to convert human demonstrations into a format suitable for agent's retrieval.","title":"record_processor"},{"location":"project_directory_structure/#vetordb","text":"Purpose: Stores all data within the vector database for Retrieval-Augmented Generation (RAG). Details: This directory is essential for maintaining the data that enhances the agents' ability to retrieve relevant information and generate more accurate responses.","title":"vetordb"},{"location":"project_directory_structure/#logs","text":"Purpose: Stores log files generated by the application. Details: This directory helps in monitoring, debugging, and analyzing the application's performance and behavior. Logs are generated dynamically as the application runs.","title":"logs"},{"location":"project_directory_structure/#ufo","text":"Purpose: The core directory containing the main project code. Details: This directory is further subdivided into multiple subdirectories, each serving a specific purpose within the project.","title":"ufo"},{"location":"project_directory_structure/#module","text":"Purpose: Contains the basic modules of the UFO project, such as session management and rounds. Details: This includes foundational classes and functions that are used throughout the project.","title":"module"},{"location":"project_directory_structure/#agents","text":"Purpose: Houses the code implementations of various agents in the UFO project. Details: Agents are components that perform specific tasks within the system, and this directory contains their logic, components, and behavior.","title":"agents"},{"location":"project_directory_structure/#automator","text":"Purpose: Implements the skill set of agents to automate applications. Details: This includes scripts and tools that enable agents to interact with and automate tasks in various applications, such as mouse and keyboard actions and API calls.","title":"automator"},{"location":"project_directory_structure/#experience","text":"Purpose: Parses and saves the agent's self-experience. Details: This directory contains mechanisms for agents to learn from their actions and outcomes, improving their performance over time.","title":"experience"},{"location":"project_directory_structure/#llm","text":"Purpose: Stores the implementation of the Large Language Model (LLM). Details: This includes the implementation of APIs for different language models, such as GPT, Genimi, QWEN, etc., that are used by the agents.","title":"llm"},{"location":"project_directory_structure/#prompter","text":"Purpose: Constructs prompts for the agents. Details: This directory includes prompt construction logic and tools that help agents generate meaningful prompts for user interactions.","title":"prompter"},{"location":"project_directory_structure/#prompts","text":"Purpose: Contains prompt templates and files used to construct the full prompt. Details: This includes predefined prompt structures and content that are used to create meaningful interactions with the agents.","title":"prompts"},{"location":"project_directory_structure/#rag","text":"Purpose: Implements Retrieval-Augmented Generation (RAG) from different sources to enhance the agents' abilities. etails: This directory includes scripts and tools for integrating various data sources into the RAG framework, improving the accuracy and relevance of the agents' outputs.","title":"rag"},{"location":"project_directory_structure/#utils","text":"Purpose: Contains utility functions. Details: This directory includes helper functions, common utilities, and other reusable code snippets that support the project's operations.","title":"utils"},{"location":"project_directory_structure/#config","text":"Purpose: Stores configuration files. Details: This directory includes different configuration files for various environments and purposes. config.yaml: User configuration file for LLM and other settings. You need to rename config.yaml.template to config.yaml and edit the configuration settings as needed. config_dev.yaml : Developer-specific configuration file with settings tailored for development purposes.","title":"config"},{"location":"project_directory_structure/#ufopy","text":"Purpose: Main entry point for the UFO client. Details: This script initializes and starts the UFO application.","title":"ufo.py"},{"location":"about/CODE_OF_CONDUCT/","text":"Microsoft Open Source Code of Conduct This project has adopted the Microsoft Open Source Code of Conduct . Resources: Microsoft Open Source Code of Conduct Microsoft Code of Conduct FAQ Contact opencode@microsoft.com with questions or concerns","title":"Code of Conduct"},{"location":"about/CODE_OF_CONDUCT/#microsoft-open-source-code-of-conduct","text":"This project has adopted the Microsoft Open Source Code of Conduct . Resources: Microsoft Open Source Code of Conduct Microsoft Code of Conduct FAQ Contact opencode@microsoft.com with questions or concerns","title":"Microsoft Open Source Code of Conduct"},{"location":"about/CONTRIBUTING/","text":"Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. Note You should sunmit your pull request to the pre-release branch, not the main branch. This project has adopted the Microsoft Open Source Code of Conduct . For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.","title":"Contributing"},{"location":"about/CONTRIBUTING/#contributing","text":"This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. Note You should sunmit your pull request to the pre-release branch, not the main branch. This project has adopted the Microsoft Open Source Code of Conduct . For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.","title":"Contributing"},{"location":"about/DISCLAIMER/","text":"Disclaimer: Code Execution and Data Handling Notice By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices: 1. Code Functionality: The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference. 2. Data Privacy and Storage: It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft. 3. User Responsibility: By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process. 4. Security Measures: Microsoft has implemented security measures to safeguard the action execution. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system. 5. Consent for Inference: You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code. 6. No Guarantee of Accuracy: The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model. 7. Indemnification: Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo. 8. Reporting Infringements: If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary. 9. Modifications to the Disclaimer: Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes. By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.","title":"Disclaimer"},{"location":"about/DISCLAIMER/#disclaimer-code-execution-and-data-handling-notice","text":"By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices:","title":"Disclaimer: Code Execution and Data Handling Notice"},{"location":"about/DISCLAIMER/#1-code-functionality","text":"The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference.","title":"1. Code Functionality:"},{"location":"about/DISCLAIMER/#2-data-privacy-and-storage","text":"It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft.","title":"2. Data Privacy and Storage:"},{"location":"about/DISCLAIMER/#3-user-responsibility","text":"By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process.","title":"3. User Responsibility:"},{"location":"about/DISCLAIMER/#4-security-measures","text":"Microsoft has implemented security measures to safeguard the action execution. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system.","title":"4. Security Measures:"},{"location":"about/DISCLAIMER/#5-consent-for-inference","text":"You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code.","title":"5. Consent for Inference:"},{"location":"about/DISCLAIMER/#6-no-guarantee-of-accuracy","text":"The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model.","title":"6. No Guarantee of Accuracy:"},{"location":"about/DISCLAIMER/#7-indemnification","text":"Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo.","title":"7. Indemnification:"},{"location":"about/DISCLAIMER/#8-reporting-infringements","text":"If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary.","title":"8. Reporting Infringements:"},{"location":"about/DISCLAIMER/#9-modifications-to-the-disclaimer","text":"Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes. By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.","title":"9. Modifications to the Disclaimer:"},{"location":"about/LICENSE/","text":"Copyright (c) Microsoft Corporation. MIT License Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"License"},{"location":"about/LICENSE/#mit-license","text":"Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"about/SUPPORT/","text":"Support How to file issues and get help This project uses GitHub Issues to track bugs and feature requests. Please search the existing issues before filing new issues to avoid duplicates. For new issues, file your bug or feature request as a new Issue. You may use GitHub Issues to raise questions, bug reports, and feature requests. For help and questions about using this project, please please contact ufo-agent@microsoft.com . Microsoft Support Policy Support for this PROJECT or PRODUCT is limited to the resources listed above.","title":"Support"},{"location":"about/SUPPORT/#support","text":"","title":"Support"},{"location":"about/SUPPORT/#how-to-file-issues-and-get-help","text":"This project uses GitHub Issues to track bugs and feature requests. Please search the existing issues before filing new issues to avoid duplicates. For new issues, file your bug or feature request as a new Issue. You may use GitHub Issues to raise questions, bug reports, and feature requests. For help and questions about using this project, please please contact ufo-agent@microsoft.com .","title":"How to file issues and get help"},{"location":"about/SUPPORT/#microsoft-support-policy","text":"Support for this PROJECT or PRODUCT is limited to the resources listed above.","title":"Microsoft Support Policy"},{"location":"advanced_usage/customization/","text":"Customization Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user. Scenario Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again. Implementation We currently implement the customization feature in the HostAgent class. When the HostAgent needs additional information, it will transit to the PENDING state and ask the user for the information. The user will provide the information, and the HostAgent will save it in the local memory base for future reference. The saved information is stored in the blackboard and can be accessed by all agents in the session. Note The customization memory base is only saved in a local file . These information will not upload to the cloud or any other storage to protect the user's privacy. Configuration You can configure the customization feature by setting the following field in the config_dev.yaml file. Configuration Option Description Type Default Value USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20","title":"Customization"},{"location":"advanced_usage/customization/#customization","text":"Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user.","title":"Customization"},{"location":"advanced_usage/customization/#scenario","text":"Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again.","title":"Scenario"},{"location":"advanced_usage/customization/#implementation","text":"We currently implement the customization feature in the HostAgent class. When the HostAgent needs additional information, it will transit to the PENDING state and ask the user for the information. The user will provide the information, and the HostAgent will save it in the local memory base for future reference. The saved information is stored in the blackboard and can be accessed by all agents in the session. Note The customization memory base is only saved in a local file . These information will not upload to the cloud or any other storage to protect the user's privacy.","title":"Implementation"},{"location":"advanced_usage/customization/#configuration","text":"You can configure the customization feature by setting the following field in the config_dev.yaml file. Configuration Option Description Type Default Value USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20","title":"Configuration"},{"location":"advanced_usage/follower_mode/","text":"Follower Mode The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an FollowerAgent that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification. Quick Start Step 1: Create a Plan file Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields: Field Description Type task The task description. String steps The list of steps for the agent to follow. List of Strings object The application or file to interact with. String Below is an example of a plan file: { \"task\": \"Type in a text of 'Test For Fun' with heading 1 level\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.Select the 'Test For Fun' text\", \"3.Click 'Home' tab to show the 'Styles' ribbon tab\", \"4.Click 'Styles' ribbon tab to show the style 'Heading 1'\", \"5.Click 'Heading 1' style to apply the style to the selected text\" ], \"object\": \"draft.docx\" } Note The object field is the application or file that the agent will interact with. The object must be active (can be minimized) when starting the Follower mode. Step 2: Start the Follower Mode To start the Follower mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_file} Tip Replace {task_name} with the name of the task and {plan_file} with the path to the plan file. Step 3: Run in Batch (Optional) You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_folder} UFO will automatically detect the plan files in the folder and run them one by one. Tip Replace {task_name} with the name of the task and {plan_folder} with the path to the folder containing plan files. Evaluation You may want to evaluate the task is completed successfully or not by following the plan. UFO will call the EvaluationAgent to evaluate the task if EVA_SESSION is set to True in the config_dev.yaml file. You can check the evaluation log in the logs/{task_name}/evaluation.log file. References The follower mode employs a PlanReader to parse the plan file and create a FollowerSession to follow the plan. PlanReader The PlanReader is located in the ufo/module/sessions/plan_reader.py file. The reader for a plan file. Initialize a plan reader. Parameters: plan_file ( str ) \u2013 The path of the plan file. Source code in module/sessions/plan_reader.py 17 18 19 20 21 22 23 24 25 def __init__ ( self , plan_file : str ): \"\"\" Initialize a plan reader. :param plan_file: The path of the plan file. \"\"\" with open ( plan_file , \"r\" ) as f : self . plan = json . load ( f ) self . remaining_steps = self . get_steps () get_host_agent_request () Get the request for the host agent. Returns: str \u2013 The request for the host agent. Source code in module/sessions/plan_reader.py 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def get_host_agent_request ( self ) -> str : \"\"\" Get the request for the host agent. :return: The request for the host agent. \"\"\" object_name = self . get_operation_object () request = ( f \"Open and select the application of { object_name } , and output the FINISH status immediately. \" \"You must output the selected application with their control text and label even if it is already open.\" ) return request get_initial_request () Get the initial request in the plan. Returns: str \u2013 The initial request. Source code in module/sessions/plan_reader.py 51 52 53 54 55 56 57 58 59 60 61 62 def get_initial_request ( self ) -> str : \"\"\" Get the initial request in the plan. :return: The initial request. \"\"\" task = self . get_task () object_name = self . get_operation_object () request = f \" { task } in { object_name } \" return request get_operation_object () Get the operation object in the step. Returns: str \u2013 The operation object. Source code in module/sessions/plan_reader.py 43 44 45 46 47 48 49 def get_operation_object ( self ) -> str : \"\"\" Get the operation object in the step. :return: The operation object. \"\"\" return self . plan . get ( \"object\" , \"\" ) get_steps () Get the steps in the plan. Returns: List [ str ] \u2013 The steps in the plan. Source code in module/sessions/plan_reader.py 35 36 37 38 39 40 41 def get_steps ( self ) -> List [ str ]: \"\"\" Get the steps in the plan. :return: The steps in the plan. \"\"\" return self . plan . get ( \"steps\" , []) get_task () Get the task name. Returns: str \u2013 The task name. Source code in module/sessions/plan_reader.py 27 28 29 30 31 32 33 def get_task ( self ) -> str : \"\"\" Get the task name. :return: The task name. \"\"\" return self . plan . get ( \"task\" , \"\" ) next_step () Get the next step in the plan. Returns: Optional [ str ] \u2013 The next step. Source code in module/sessions/plan_reader.py 79 80 81 82 83 84 85 86 87 88 89 def next_step ( self ) -> Optional [ str ]: \"\"\" Get the next step in the plan. :return: The next step. \"\"\" if self . remaining_steps : step = self . remaining_steps . pop ( 0 ) return step return None task_finished () Check if the task is finished. Returns: bool \u2013 True if the task is finished, False otherwise. Source code in module/sessions/plan_reader.py 91 92 93 94 95 96 97 def task_finished ( self ) -> bool : \"\"\" Check if the task is finished. :return: True if the task is finished, False otherwise. \"\"\" return not self . remaining_steps FollowerSession The FollowerSession is also located in the ufo/module/sessions/session.py file. Bases: BaseSession A session for following a list of plan for action taken. This session is used for the follower agent, which accepts a plan file to follow using the PlanReader. Initialize a session. Parameters: task ( str ) \u2013 The name of current task. plan_file ( str ) \u2013 The path of the plan file to follow. should_evaluate ( bool ) \u2013 Whether to evaluate the session. id ( int ) \u2013 The id of the session. Source code in module/sessions/session.py 197 198 199 200 201 202 203 204 205 206 207 208 209 210 def __init__ ( self , task : str , plan_file : str , should_evaluate : bool , id : int ) -> None : \"\"\" Initialize a session. :param task: The name of current task. :param plan_file: The path of the plan file to follow. :param should_evaluate: Whether to evaluate the session. :param id: The id of the session. \"\"\" super () . __init__ ( task , should_evaluate , id ) self . plan_reader = PlanReader ( plan_file ) create_new_round () Create a new round. Source code in module/sessions/session.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 def create_new_round ( self ) -> None : \"\"\" Create a new round. \"\"\" # Get a request for the new round. request = self . next_request () # Create a new round and return None if the session is finished. if self . is_finished (): return None if self . total_rounds == 0 : utils . print_with_color ( \"Complete the following request:\" , \"yellow\" ) utils . print_with_color ( self . plan_reader . get_initial_request (), \"cyan\" ) agent = self . _host_agent else : agent = self . _host_agent . get_active_appagent () # Clear the memory and set the state to continue the app agent. agent . clear_memory () agent . blackboard . requests . clear () agent . set_state ( ContinueAppAgentState ()) round = BaseRound ( request = request , agent = agent , context = self . context , should_evaluate = configs . get ( \"EVA_ROUND\" , False ), id = self . total_rounds , ) self . add_round ( round . id , round ) return round next_request () Get the request for the new round. Source code in module/sessions/session.py 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 def next_request ( self ) -> str : \"\"\" Get the request for the new round. \"\"\" # If the task is finished, return an empty string. if self . plan_reader . task_finished (): self . _finish = True return \"\" # Get the request from the plan reader. if self . total_rounds == 0 : return self . plan_reader . get_host_agent_request () else : return self . plan_reader . next_step () request_to_evaluate () Check if the session should be evaluated. Returns: bool \u2013 True if the session should be evaluated, False otherwise. Source code in module/sessions/session.py 273 274 275 276 277 278 279 def request_to_evaluate ( self ) -> bool : \"\"\" Check if the session should be evaluated. :return: True if the session should be evaluated, False otherwise. \"\"\" return self . plan_reader . get_task ()","title":"Follower Mode"},{"location":"advanced_usage/follower_mode/#follower-mode","text":"The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an FollowerAgent that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification.","title":"Follower Mode"},{"location":"advanced_usage/follower_mode/#quick-start","text":"","title":"Quick Start"},{"location":"advanced_usage/follower_mode/#step-1-create-a-plan-file","text":"Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields: Field Description Type task The task description. String steps The list of steps for the agent to follow. List of Strings object The application or file to interact with. String Below is an example of a plan file: { \"task\": \"Type in a text of 'Test For Fun' with heading 1 level\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.Select the 'Test For Fun' text\", \"3.Click 'Home' tab to show the 'Styles' ribbon tab\", \"4.Click 'Styles' ribbon tab to show the style 'Heading 1'\", \"5.Click 'Heading 1' style to apply the style to the selected text\" ], \"object\": \"draft.docx\" } Note The object field is the application or file that the agent will interact with. The object must be active (can be minimized) when starting the Follower mode.","title":"Step 1: Create a Plan file"},{"location":"advanced_usage/follower_mode/#step-2-start-the-follower-mode","text":"To start the Follower mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_file} Tip Replace {task_name} with the name of the task and {plan_file} with the path to the plan file.","title":"Step 2: Start the Follower Mode"},{"location":"advanced_usage/follower_mode/#step-3-run-in-batch-optional","text":"You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command: # assume you are in the cloned UFO folder python ufo.py --task_name {task_name} --mode follower --plan {plan_folder} UFO will automatically detect the plan files in the folder and run them one by one. Tip Replace {task_name} with the name of the task and {plan_folder} with the path to the folder containing plan files.","title":"Step 3: Run in Batch (Optional)"},{"location":"advanced_usage/follower_mode/#evaluation","text":"You may want to evaluate the task is completed successfully or not by following the plan. UFO will call the EvaluationAgent to evaluate the task if EVA_SESSION is set to True in the config_dev.yaml file. You can check the evaluation log in the logs/{task_name}/evaluation.log file.","title":"Evaluation"},{"location":"advanced_usage/follower_mode/#references","text":"The follower mode employs a PlanReader to parse the plan file and create a FollowerSession to follow the plan.","title":"References"},{"location":"advanced_usage/follower_mode/#planreader","text":"The PlanReader is located in the ufo/module/sessions/plan_reader.py file. The reader for a plan file. Initialize a plan reader. Parameters: plan_file ( str ) \u2013 The path of the plan file. Source code in module/sessions/plan_reader.py 17 18 19 20 21 22 23 24 25 def __init__ ( self , plan_file : str ): \"\"\" Initialize a plan reader. :param plan_file: The path of the plan file. \"\"\" with open ( plan_file , \"r\" ) as f : self . plan = json . load ( f ) self . remaining_steps = self . get_steps ()","title":"PlanReader"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_host_agent_request","text":"Get the request for the host agent. Returns: str \u2013 The request for the host agent. Source code in module/sessions/plan_reader.py 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def get_host_agent_request ( self ) -> str : \"\"\" Get the request for the host agent. :return: The request for the host agent. \"\"\" object_name = self . get_operation_object () request = ( f \"Open and select the application of { object_name } , and output the FINISH status immediately. \" \"You must output the selected application with their control text and label even if it is already open.\" ) return request","title":"get_host_agent_request"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_initial_request","text":"Get the initial request in the plan. Returns: str \u2013 The initial request. Source code in module/sessions/plan_reader.py 51 52 53 54 55 56 57 58 59 60 61 62 def get_initial_request ( self ) -> str : \"\"\" Get the initial request in the plan. :return: The initial request. \"\"\" task = self . get_task () object_name = self . get_operation_object () request = f \" { task } in { object_name } \" return request","title":"get_initial_request"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_operation_object","text":"Get the operation object in the step. Returns: str \u2013 The operation object. Source code in module/sessions/plan_reader.py 43 44 45 46 47 48 49 def get_operation_object ( self ) -> str : \"\"\" Get the operation object in the step. :return: The operation object. \"\"\" return self . plan . get ( \"object\" , \"\" )","title":"get_operation_object"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_steps","text":"Get the steps in the plan. Returns: List [ str ] \u2013 The steps in the plan. Source code in module/sessions/plan_reader.py 35 36 37 38 39 40 41 def get_steps ( self ) -> List [ str ]: \"\"\" Get the steps in the plan. :return: The steps in the plan. \"\"\" return self . plan . get ( \"steps\" , [])","title":"get_steps"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.get_task","text":"Get the task name. Returns: str \u2013 The task name. Source code in module/sessions/plan_reader.py 27 28 29 30 31 32 33 def get_task ( self ) -> str : \"\"\" Get the task name. :return: The task name. \"\"\" return self . plan . get ( \"task\" , \"\" )","title":"get_task"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.next_step","text":"Get the next step in the plan. Returns: Optional [ str ] \u2013 The next step. Source code in module/sessions/plan_reader.py 79 80 81 82 83 84 85 86 87 88 89 def next_step ( self ) -> Optional [ str ]: \"\"\" Get the next step in the plan. :return: The next step. \"\"\" if self . remaining_steps : step = self . remaining_steps . pop ( 0 ) return step return None","title":"next_step"},{"location":"advanced_usage/follower_mode/#module.sessions.plan_reader.PlanReader.task_finished","text":"Check if the task is finished. Returns: bool \u2013 True if the task is finished, False otherwise. Source code in module/sessions/plan_reader.py 91 92 93 94 95 96 97 def task_finished ( self ) -> bool : \"\"\" Check if the task is finished. :return: True if the task is finished, False otherwise. \"\"\" return not self . remaining_steps","title":"task_finished"},{"location":"advanced_usage/follower_mode/#followersession","text":"The FollowerSession is also located in the ufo/module/sessions/session.py file. Bases: BaseSession A session for following a list of plan for action taken. This session is used for the follower agent, which accepts a plan file to follow using the PlanReader. Initialize a session. Parameters: task ( str ) \u2013 The name of current task. plan_file ( str ) \u2013 The path of the plan file to follow. should_evaluate ( bool ) \u2013 Whether to evaluate the session. id ( int ) \u2013 The id of the session. Source code in module/sessions/session.py 197 198 199 200 201 202 203 204 205 206 207 208 209 210 def __init__ ( self , task : str , plan_file : str , should_evaluate : bool , id : int ) -> None : \"\"\" Initialize a session. :param task: The name of current task. :param plan_file: The path of the plan file to follow. :param should_evaluate: Whether to evaluate the session. :param id: The id of the session. \"\"\" super () . __init__ ( task , should_evaluate , id ) self . plan_reader = PlanReader ( plan_file )","title":"FollowerSession"},{"location":"advanced_usage/follower_mode/#module.sessions.session.FollowerSession.create_new_round","text":"Create a new round. Source code in module/sessions/session.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 def create_new_round ( self ) -> None : \"\"\" Create a new round. \"\"\" # Get a request for the new round. request = self . next_request () # Create a new round and return None if the session is finished. if self . is_finished (): return None if self . total_rounds == 0 : utils . print_with_color ( \"Complete the following request:\" , \"yellow\" ) utils . print_with_color ( self . plan_reader . get_initial_request (), \"cyan\" ) agent = self . _host_agent else : agent = self . _host_agent . get_active_appagent () # Clear the memory and set the state to continue the app agent. agent . clear_memory () agent . blackboard . requests . clear () agent . set_state ( ContinueAppAgentState ()) round = BaseRound ( request = request , agent = agent , context = self . context , should_evaluate = configs . get ( \"EVA_ROUND\" , False ), id = self . total_rounds , ) self . add_round ( round . id , round ) return round","title":"create_new_round"},{"location":"advanced_usage/follower_mode/#module.sessions.session.FollowerSession.next_request","text":"Get the request for the new round. Source code in module/sessions/session.py 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 def next_request ( self ) -> str : \"\"\" Get the request for the new round. \"\"\" # If the task is finished, return an empty string. if self . plan_reader . task_finished (): self . _finish = True return \"\" # Get the request from the plan reader. if self . total_rounds == 0 : return self . plan_reader . get_host_agent_request () else : return self . plan_reader . next_step ()","title":"next_request"},{"location":"advanced_usage/follower_mode/#module.sessions.session.FollowerSession.request_to_evaluate","text":"Check if the session should be evaluated. Returns: bool \u2013 True if the session should be evaluated, False otherwise. Source code in module/sessions/session.py 273 274 275 276 277 278 279 def request_to_evaluate ( self ) -> bool : \"\"\" Check if the session should be evaluated. :return: True if the session should be evaluated, False otherwise. \"\"\" return self . plan_reader . get_task ()","title":"request_to_evaluate"},{"location":"advanced_usage/control_filtering/icon_filtering/","text":"Icon Filter The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings. Configuration To activate the icon control filtering, you need to add ICON to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed icon control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add ICON to the list. CONTROL_FILTER_TOP_K_ICON : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_ICON_NAME : The control filter model name for icon similarity. By default, it is set to \"clip-ViT-B-32\". Reference Bases: BasicControlFilter A class that represents a icon model for control filtering. control_filter ( control_dicts , cropped_icons_dict , plans , top_k ) Filters control items based on their scores and returns the top-k items. Parameters: control_dicts \u2013 The dictionary of all control items. cropped_icons_dict \u2013 The dictionary of the cropped icons. plans \u2013 The plans to compare the control icons against. top_k \u2013 The number of top items to return. Returns: \u2013 The list of top-k control items based on their scores. Source code in automator/ui_control/control_filter.py 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 def control_filter ( self , control_dicts , cropped_icons_dict , plans , top_k ): \"\"\" Filters control items based on their scores and returns the top-k items. :param control_dicts: The dictionary of all control items. :param cropped_icons_dict: The dictionary of the cropped icons. :param plans: The plans to compare the control icons against. :param top_k: The number of top items to return. :return: The list of top-k control items based on their scores. \"\"\" scores_items = [] filtered_control_dict = {} for label , cropped_icon in cropped_icons_dict . items (): score = self . control_filter_score ( cropped_icon , plans ) scores_items . append (( score , label )) topk_scores_items = heapq . nlargest ( top_k , scores_items , key = lambda x : x [ 0 ]) topk_labels = [ scores_items [ 1 ] for scores_items in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_labels : filtered_control_dict [ label ] = control_item return filtered_control_dict control_filter_score ( control_icon , plans ) Calculates the score of a control icon based on its similarity to the given keywords. Parameters: control_icon \u2013 The control icon image. plans \u2013 The plan to compare the control icon against. Returns: \u2013 The maximum similarity score between the control icon and the keywords. Source code in automator/ui_control/control_filter.py 240 241 242 243 244 245 246 247 248 249 250 def control_filter_score ( self , control_icon , plans ): \"\"\" Calculates the score of a control icon based on its similarity to the given keywords. :param control_icon: The control icon image. :param plans: The plan to compare the control icon against. :return: The maximum similarity score between the control icon and the keywords. \"\"\" plans_embedding = self . get_embedding ( plans ) control_icon_embedding = self . get_embedding ( control_icon ) return max ( self . cos_sim ( control_icon_embedding , plans_embedding ) . tolist ()[ 0 ])","title":"Icon Filtering"},{"location":"advanced_usage/control_filtering/icon_filtering/#icon-filter","text":"The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings.","title":"Icon Filter"},{"location":"advanced_usage/control_filtering/icon_filtering/#configuration","text":"To activate the icon control filtering, you need to add ICON to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed icon control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add ICON to the list. CONTROL_FILTER_TOP_K_ICON : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_ICON_NAME : The control filter model name for icon similarity. By default, it is set to \"clip-ViT-B-32\".","title":"Configuration"},{"location":"advanced_usage/control_filtering/icon_filtering/#reference","text":"Bases: BasicControlFilter A class that represents a icon model for control filtering.","title":"Reference"},{"location":"advanced_usage/control_filtering/icon_filtering/#automator.ui_control.control_filter.IconControlFilter.control_filter","text":"Filters control items based on their scores and returns the top-k items. Parameters: control_dicts \u2013 The dictionary of all control items. cropped_icons_dict \u2013 The dictionary of the cropped icons. plans \u2013 The plans to compare the control icons against. top_k \u2013 The number of top items to return. Returns: \u2013 The list of top-k control items based on their scores. Source code in automator/ui_control/control_filter.py 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 def control_filter ( self , control_dicts , cropped_icons_dict , plans , top_k ): \"\"\" Filters control items based on their scores and returns the top-k items. :param control_dicts: The dictionary of all control items. :param cropped_icons_dict: The dictionary of the cropped icons. :param plans: The plans to compare the control icons against. :param top_k: The number of top items to return. :return: The list of top-k control items based on their scores. \"\"\" scores_items = [] filtered_control_dict = {} for label , cropped_icon in cropped_icons_dict . items (): score = self . control_filter_score ( cropped_icon , plans ) scores_items . append (( score , label )) topk_scores_items = heapq . nlargest ( top_k , scores_items , key = lambda x : x [ 0 ]) topk_labels = [ scores_items [ 1 ] for scores_items in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_labels : filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"control_filter"},{"location":"advanced_usage/control_filtering/icon_filtering/#automator.ui_control.control_filter.IconControlFilter.control_filter_score","text":"Calculates the score of a control icon based on its similarity to the given keywords. Parameters: control_icon \u2013 The control icon image. plans \u2013 The plan to compare the control icon against. Returns: \u2013 The maximum similarity score between the control icon and the keywords. Source code in automator/ui_control/control_filter.py 240 241 242 243 244 245 246 247 248 249 250 def control_filter_score ( self , control_icon , plans ): \"\"\" Calculates the score of a control icon based on its similarity to the given keywords. :param control_icon: The control icon image. :param plans: The plan to compare the control icon against. :return: The maximum similarity score between the control icon and the keywords. \"\"\" plans_embedding = self . get_embedding ( plans ) control_icon_embedding = self . get_embedding ( control_icon ) return max ( self . cos_sim ( control_icon_embedding , plans_embedding ) . tolist ()[ 0 ])","title":"control_filter_score"},{"location":"advanced_usage/control_filtering/overview/","text":"Control Filtering There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task. Execept for configuring the control types for selection on CONTROL_LIST in config_dev.yaml , UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods: Filtering Method Description Text Filter the controls based on the control text. Semantic Filter the controls based on the semantic similarity. Icon Filter the controls based on the control icon image. Configuration You can activate the control filtering by setting the CONTROL_FILTER in the config_dev.yaml file. The CONTROL_FILTER is a list of filtering methods that you want to apply to the controls, which can be TEXT , SEMANTIC , or ICON . You can configure multiple filtering methods in the CONTROL_FILTER list. Reference The implementation of the control filtering is base on the BasicControlFilter class located in the ufo/automator/ui_control/control_filter.py file. Concrete filtering class inherit from the BasicControlFilter class and implement the control_filter method to filter the controls based on the specific filtering method. BasicControlFilter represents a model for filtering control items. __new__ ( model_path ) Creates a new instance of BasicControlFilter. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The BasicControlFilter instance. Source code in automator/ui_control/control_filter.py 72 73 74 75 76 77 78 79 80 81 82 def __new__ ( cls , model_path ): \"\"\" Creates a new instance of BasicControlFilter. :param model_path: The path to the model. :return: The BasicControlFilter instance. \"\"\" if model_path not in cls . _instances : instance = super ( BasicControlFilter , cls ) . __new__ ( cls ) instance . model = cls . load_model ( model_path ) cls . _instances [ model_path ] = instance return cls . _instances [ model_path ] control_filter ( control_dicts , plans , ** kwargs ) abstractmethod Calculates the cosine similarity between the embeddings of the given keywords and the control item. Parameters: control_dicts \u2013 The control item to be compared with the plans. plans \u2013 The plans to be used for calculating the similarity. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 104 105 106 107 108 109 110 111 112 @abstractmethod def control_filter ( self , control_dicts , plans , ** kwargs ): \"\"\" Calculates the cosine similarity between the embeddings of the given keywords and the control item. :param control_dicts: The control item to be compared with the plans. :param plans: The plans to be used for calculating the similarity. :return: The filtered control items. \"\"\" pass cos_sim ( embedding1 , embedding2 ) staticmethod Computes the cosine similarity between two embeddings. Parameters: embedding1 \u2013 The first embedding. embedding2 \u2013 The second embedding. Returns: float \u2013 The cosine similarity between the two embeddings. Source code in automator/ui_control/control_filter.py 153 154 155 156 157 158 159 160 161 162 163 @staticmethod def cos_sim ( embedding1 , embedding2 ) -> float : \"\"\" Computes the cosine similarity between two embeddings. :param embedding1: The first embedding. :param embedding2: The second embedding. :return: The cosine similarity between the two embeddings. \"\"\" import sentence_transformers return sentence_transformers . util . cos_sim ( embedding1 , embedding2 ) get_embedding ( content ) Encodes the given object into an embedding. Parameters: content \u2013 The content to encode. Returns: \u2013 The embedding of the object. Source code in automator/ui_control/control_filter.py 95 96 97 98 99 100 101 102 def get_embedding ( self , content ): \"\"\" Encodes the given object into an embedding. :param content: The content to encode. :return: The embedding of the object. \"\"\" return self . model . encode ( content ) load_model ( model_path ) staticmethod Loads the model from the given model path. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The loaded model. Source code in automator/ui_control/control_filter.py 84 85 86 87 88 89 90 91 92 93 @staticmethod def load_model ( model_path ): \"\"\" Loads the model from the given model path. :param model_path: The path to the model. :return: The loaded model. \"\"\" import sentence_transformers return sentence_transformers . SentenceTransformer ( model_path ) plans_to_keywords ( plans ) staticmethod Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. Parameters: plans ( List [ str ] ) \u2013 The plan to be parsed. Returns: List [ str ] \u2013 A list of keywords extracted from the plan. Source code in automator/ui_control/control_filter.py 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 @staticmethod def plans_to_keywords ( plans : List [ str ]) -> List [ str ]: \"\"\" Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. :param plans: The plan to be parsed. :return: A list of keywords extracted from the plan. \"\"\" keywords = [] for plan in plans : words = plan . replace ( \"'\" , \"\" ) . strip ( \".\" ) . split () words = [ word for word in words if word . isalpha () or bool ( re . fullmatch ( r \"[\\u4e00-\\u9fa5]+\" , word )) ] keywords . extend ( words ) return keywords remove_stopwords ( keywords ) staticmethod Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). Parameters: keywords \u2013 The list of keywords to be filtered. Returns: \u2013 The list of keywords with the stopwords removed. Source code in automator/ui_control/control_filter.py 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 @staticmethod def remove_stopwords ( keywords ): \"\"\" Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). :param keywords: The list of keywords to be filtered. :return: The list of keywords with the stopwords removed. \"\"\" try : from nltk.corpus import stopwords stopwords_list = stopwords . words ( \"english\" ) except LookupError as e : import nltk nltk . download ( \"stopwords\" ) stopwords_list = nltk . corpus . stopwords . words ( \"english\" ) return [ keyword for keyword in keywords if keyword in stopwords_list ]","title":"Overview"},{"location":"advanced_usage/control_filtering/overview/#control-filtering","text":"There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task. Execept for configuring the control types for selection on CONTROL_LIST in config_dev.yaml , UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods: Filtering Method Description Text Filter the controls based on the control text. Semantic Filter the controls based on the semantic similarity. Icon Filter the controls based on the control icon image.","title":"Control Filtering"},{"location":"advanced_usage/control_filtering/overview/#configuration","text":"You can activate the control filtering by setting the CONTROL_FILTER in the config_dev.yaml file. The CONTROL_FILTER is a list of filtering methods that you want to apply to the controls, which can be TEXT , SEMANTIC , or ICON . You can configure multiple filtering methods in the CONTROL_FILTER list.","title":"Configuration"},{"location":"advanced_usage/control_filtering/overview/#reference","text":"The implementation of the control filtering is base on the BasicControlFilter class located in the ufo/automator/ui_control/control_filter.py file. Concrete filtering class inherit from the BasicControlFilter class and implement the control_filter method to filter the controls based on the specific filtering method. BasicControlFilter represents a model for filtering control items.","title":"Reference"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.__new__","text":"Creates a new instance of BasicControlFilter. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The BasicControlFilter instance. Source code in automator/ui_control/control_filter.py 72 73 74 75 76 77 78 79 80 81 82 def __new__ ( cls , model_path ): \"\"\" Creates a new instance of BasicControlFilter. :param model_path: The path to the model. :return: The BasicControlFilter instance. \"\"\" if model_path not in cls . _instances : instance = super ( BasicControlFilter , cls ) . __new__ ( cls ) instance . model = cls . load_model ( model_path ) cls . _instances [ model_path ] = instance return cls . _instances [ model_path ]","title":"__new__"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.control_filter","text":"Calculates the cosine similarity between the embeddings of the given keywords and the control item. Parameters: control_dicts \u2013 The control item to be compared with the plans. plans \u2013 The plans to be used for calculating the similarity. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 104 105 106 107 108 109 110 111 112 @abstractmethod def control_filter ( self , control_dicts , plans , ** kwargs ): \"\"\" Calculates the cosine similarity between the embeddings of the given keywords and the control item. :param control_dicts: The control item to be compared with the plans. :param plans: The plans to be used for calculating the similarity. :return: The filtered control items. \"\"\" pass","title":"control_filter"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.cos_sim","text":"Computes the cosine similarity between two embeddings. Parameters: embedding1 \u2013 The first embedding. embedding2 \u2013 The second embedding. Returns: float \u2013 The cosine similarity between the two embeddings. Source code in automator/ui_control/control_filter.py 153 154 155 156 157 158 159 160 161 162 163 @staticmethod def cos_sim ( embedding1 , embedding2 ) -> float : \"\"\" Computes the cosine similarity between two embeddings. :param embedding1: The first embedding. :param embedding2: The second embedding. :return: The cosine similarity between the two embeddings. \"\"\" import sentence_transformers return sentence_transformers . util . cos_sim ( embedding1 , embedding2 )","title":"cos_sim"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.get_embedding","text":"Encodes the given object into an embedding. Parameters: content \u2013 The content to encode. Returns: \u2013 The embedding of the object. Source code in automator/ui_control/control_filter.py 95 96 97 98 99 100 101 102 def get_embedding ( self , content ): \"\"\" Encodes the given object into an embedding. :param content: The content to encode. :return: The embedding of the object. \"\"\" return self . model . encode ( content )","title":"get_embedding"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.load_model","text":"Loads the model from the given model path. Parameters: model_path \u2013 The path to the model. Returns: \u2013 The loaded model. Source code in automator/ui_control/control_filter.py 84 85 86 87 88 89 90 91 92 93 @staticmethod def load_model ( model_path ): \"\"\" Loads the model from the given model path. :param model_path: The path to the model. :return: The loaded model. \"\"\" import sentence_transformers return sentence_transformers . SentenceTransformer ( model_path )","title":"load_model"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.plans_to_keywords","text":"Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. Parameters: plans ( List [ str ] ) \u2013 The plan to be parsed. Returns: List [ str ] \u2013 A list of keywords extracted from the plan. Source code in automator/ui_control/control_filter.py 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 @staticmethod def plans_to_keywords ( plans : List [ str ]) -> List [ str ]: \"\"\" Gets keywords from the plan. We only consider the words in the plan that are alphabetic or Chinese characters. :param plans: The plan to be parsed. :return: A list of keywords extracted from the plan. \"\"\" keywords = [] for plan in plans : words = plan . replace ( \"'\" , \"\" ) . strip ( \".\" ) . split () words = [ word for word in words if word . isalpha () or bool ( re . fullmatch ( r \"[\\u4e00-\\u9fa5]+\" , word )) ] keywords . extend ( words ) return keywords","title":"plans_to_keywords"},{"location":"advanced_usage/control_filtering/overview/#automator.ui_control.control_filter.BasicControlFilter.remove_stopwords","text":"Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). Parameters: keywords \u2013 The list of keywords to be filtered. Returns: \u2013 The list of keywords with the stopwords removed. Source code in automator/ui_control/control_filter.py 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 @staticmethod def remove_stopwords ( keywords ): \"\"\" Removes stopwords from the given list of keywords. If you are using stopwords for the first time, you need to download them using nltk.download('stopwords'). :param keywords: The list of keywords to be filtered. :return: The list of keywords with the stopwords removed. \"\"\" try : from nltk.corpus import stopwords stopwords_list = stopwords . words ( \"english\" ) except LookupError as e : import nltk nltk . download ( \"stopwords\" ) stopwords_list = nltk . corpus . stopwords . words ( \"english\" ) return [ keyword for keyword in keywords if keyword in stopwords_list ]","title":"remove_stopwords"},{"location":"advanced_usage/control_filtering/semantic_filtering/","text":"Sematic Control Filter The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings. Configuration To activate the semantic control filtering, you need to add SEMANTIC to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed sematic control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add SEMANTIC to the list. CONTROL_FILTER_TOP_K_SEMANTIC : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_SEMANTIC_NAME : The control filter model name for semantic similarity. By default, it is set to \"all-MiniLM-L6-v2\". Reference Bases: BasicControlFilter A class that represents a semantic model for control filtering. control_filter ( control_dicts , plans , top_k ) Filters control items based on their similarity to a set of keywords. Parameters: control_dicts \u2013 The dictionary of control items to be filtered. plans \u2013 The list of plans to be used for filtering. top_k \u2013 The number of top control items to return. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 def control_filter ( self , control_dicts , plans , top_k ): \"\"\" Filters control items based on their similarity to a set of keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :param top_k: The number of top control items to return. :return: The filtered control items. \"\"\" scores_items = [] filtered_control_dict = {} for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () score = self . control_filter_score ( control_text , plans ) scores_items . append (( label , score )) topk_scores_items = heapq . nlargest ( top_k , ( scores_items ), key = lambda x : x [ 1 ]) topk_items = [ ( score_item [ 0 ], score_item [ 1 ]) for score_item in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_items : filtered_control_dict [ label ] = control_item return filtered_control_dict control_filter_score ( control_text , plans ) Calculates the score for a control item based on the similarity between its text and a set of keywords. Parameters: control_text \u2013 The text of the control item. plans \u2013 The plan to be used for calculating the similarity. Returns: \u2013 The score (0-1) indicating the similarity between the control text and the keywords. Source code in automator/ui_control/control_filter.py 197 198 199 200 201 202 203 204 205 206 207 def control_filter_score ( self , control_text , plans ): \"\"\" Calculates the score for a control item based on the similarity between its text and a set of keywords. :param control_text: The text of the control item. :param plans: The plan to be used for calculating the similarity. :return: The score (0-1) indicating the similarity between the control text and the keywords. \"\"\" plan_embedding = self . get_embedding ( plans ) control_text_embedding = self . get_embedding ( control_text ) return max ( self . cos_sim ( control_text_embedding , plan_embedding ) . tolist ()[ 0 ])","title":"Semantic Filtering"},{"location":"advanced_usage/control_filtering/semantic_filtering/#sematic-control-filter","text":"The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings.","title":"Sematic Control Filter"},{"location":"advanced_usage/control_filtering/semantic_filtering/#configuration","text":"To activate the semantic control filtering, you need to add SEMANTIC to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed sematic control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add SEMANTIC to the list. CONTROL_FILTER_TOP_K_SEMANTIC : The number of controls to keep after filtering. CONTROL_FILTER_MODEL_SEMANTIC_NAME : The control filter model name for semantic similarity. By default, it is set to \"all-MiniLM-L6-v2\".","title":"Configuration"},{"location":"advanced_usage/control_filtering/semantic_filtering/#reference","text":"Bases: BasicControlFilter A class that represents a semantic model for control filtering.","title":"Reference"},{"location":"advanced_usage/control_filtering/semantic_filtering/#automator.ui_control.control_filter.SemanticControlFilter.control_filter","text":"Filters control items based on their similarity to a set of keywords. Parameters: control_dicts \u2013 The dictionary of control items to be filtered. plans \u2013 The list of plans to be used for filtering. top_k \u2013 The number of top control items to return. Returns: \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 def control_filter ( self , control_dicts , plans , top_k ): \"\"\" Filters control items based on their similarity to a set of keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :param top_k: The number of top control items to return. :return: The filtered control items. \"\"\" scores_items = [] filtered_control_dict = {} for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () score = self . control_filter_score ( control_text , plans ) scores_items . append (( label , score )) topk_scores_items = heapq . nlargest ( top_k , ( scores_items ), key = lambda x : x [ 1 ]) topk_items = [ ( score_item [ 0 ], score_item [ 1 ]) for score_item in topk_scores_items ] for label , control_item in control_dicts . items (): if label in topk_items : filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"control_filter"},{"location":"advanced_usage/control_filtering/semantic_filtering/#automator.ui_control.control_filter.SemanticControlFilter.control_filter_score","text":"Calculates the score for a control item based on the similarity between its text and a set of keywords. Parameters: control_text \u2013 The text of the control item. plans \u2013 The plan to be used for calculating the similarity. Returns: \u2013 The score (0-1) indicating the similarity between the control text and the keywords. Source code in automator/ui_control/control_filter.py 197 198 199 200 201 202 203 204 205 206 207 def control_filter_score ( self , control_text , plans ): \"\"\" Calculates the score for a control item based on the similarity between its text and a set of keywords. :param control_text: The text of the control item. :param plans: The plan to be used for calculating the similarity. :return: The score (0-1) indicating the similarity between the control text and the keywords. \"\"\" plan_embedding = self . get_embedding ( plans ) control_text_embedding = self . get_embedding ( control_text ) return max ( self . cos_sim ( control_text_embedding , plan_embedding ) . tolist ()[ 0 ])","title":"control_filter_score"},{"location":"advanced_usage/control_filtering/text_filtering/","text":"Text Control Filter The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan. Configuration To activate the text control filtering, you need to add TEXT to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed text control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add TEXT to the list. CONTROL_FILTER_TOP_K_PLAN : The number of agent's plan keywords or phrases to use for filtering the controls. Reference A class that provides methods for filtering control items based on plans. control_filter ( control_dicts , plans ) staticmethod Filters control items based on keywords. Parameters: control_dicts ( Dict ) \u2013 The dictionary of control items to be filtered. plans ( List [ str ] ) \u2013 The list of plans to be used for filtering. Returns: Dict \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 @staticmethod def control_filter ( control_dicts : Dict , plans : List [ str ]) -> Dict : \"\"\" Filters control items based on keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :return: The filtered control items. \"\"\" filtered_control_dict = {} keywords = BasicControlFilter . plans_to_keywords ( plans ) for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () if any ( keyword in control_text or control_text in keyword for keyword in keywords ): filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"Text Filtering"},{"location":"advanced_usage/control_filtering/text_filtering/#text-control-filter","text":"The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan.","title":"Text Control Filter"},{"location":"advanced_usage/control_filtering/text_filtering/#configuration","text":"To activate the text control filtering, you need to add TEXT to the CONTROL_FILTER list in the config_dev.yaml file. Below is the detailed text control filter configuration in the config_dev.yaml file: CONTROL_FILTER : A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add TEXT to the list. CONTROL_FILTER_TOP_K_PLAN : The number of agent's plan keywords or phrases to use for filtering the controls.","title":"Configuration"},{"location":"advanced_usage/control_filtering/text_filtering/#reference","text":"A class that provides methods for filtering control items based on plans.","title":"Reference"},{"location":"advanced_usage/control_filtering/text_filtering/#automator.ui_control.control_filter.TextControlFilter.control_filter","text":"Filters control items based on keywords. Parameters: control_dicts ( Dict ) \u2013 The dictionary of control items to be filtered. plans ( List [ str ] ) \u2013 The list of plans to be used for filtering. Returns: Dict \u2013 The filtered control items. Source code in automator/ui_control/control_filter.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 @staticmethod def control_filter ( control_dicts : Dict , plans : List [ str ]) -> Dict : \"\"\" Filters control items based on keywords. :param control_dicts: The dictionary of control items to be filtered. :param plans: The list of plans to be used for filtering. :return: The filtered control items. \"\"\" filtered_control_dict = {} keywords = BasicControlFilter . plans_to_keywords ( plans ) for label , control_item in control_dicts . items (): control_text = control_item . element_info . name . lower () if any ( keyword in control_text or control_text in keyword for keyword in keywords ): filtered_control_dict [ label ] = control_item return filtered_control_dict","title":"control_filter"},{"location":"advanced_usage/reinforce_appagent/experience_learning/","text":"Learning from Self-Experience When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future. Mechanism Step 1: Complete a Session Event : UFO completes a session Step 2: Ask User to Save Experience Action : The agent prompts the user with a choice to save the successful experience Step 3: User Chooses to Save Action : If the user chooses to save the experience Step 4: Summarize and Save the Experience Tool : ExperienceSummarizer Process : Summarize the experience into a demonstration example Save the demonstration example in the EXPERIENCE_SAVED_PATH as specified in the config_dev.yaml file The demonstration example includes similar fields as those used in the AppAgent's prompt Step 5: Retrieve and Utilize Saved Experience When : The AppAgent encounters a similar task in the future Action : Retrieve the saved experience from the experience database Outcome : Use the retrieved experience to generate a plan Workflow Diagram graph TD; A[Complete Session] --> B[Ask User to Save Experience] B --> C[User Chooses to Save] C --> D[Summarize with ExperienceSummarizer] D --> E[Save in EXPERIENCE_SAVED_PATH] F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience] G --> H[Generate Plan] Activate the Learning from Self-Experience Step 1: Configure the AppAgent Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 Reference Experience Summarizer The ExperienceSummarizer class is located in the ufo/experience/experience_summarizer.py file. The ExperienceSummarizer class provides the following methods to summarize the experience: The ExperienceSummarizer class is the summarizer for the experience learning. Initialize the ApplicationAgentPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in experience/summarizer.py 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str , api_prompt_template : str , ): \"\"\" Initialize the ApplicationAgentPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . example_prompt_template = example_prompt_template self . api_prompt_template = api_prompt_template build_prompt ( log_partition ) Build the prompt. Parameters: log_partition ( dict ) \u2013 The log partition. return: The prompt. Source code in experience/summarizer.py 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def build_prompt ( self , log_partition : dict ) -> list : \"\"\" Build the prompt. :param log_partition: The log partition. return: The prompt. \"\"\" experience_prompter = ExperiencePrompter ( self . is_visual , self . prompt_template , self . example_prompt_template , self . api_prompt_template , ) experience_system_prompt = experience_prompter . system_prompt_construction () experience_user_prompt = experience_prompter . user_content_construction ( log_partition ) experience_prompt = experience_prompter . prompt_construction ( experience_system_prompt , experience_user_prompt ) return experience_prompt create_or_update_vector_db ( summaries , db_path ) staticmethod Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in experience/summarizer.py 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" ) create_or_update_yaml ( summaries , yaml_path ) staticmethod Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in experience/summarizer.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" ) get_summary ( prompt_message ) Get the summary. Parameters: prompt_message ( list ) \u2013 The prompt message. return: The summary and the cost. Source code in experience/summarizer.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_summary ( self , prompt_message : list ) -> Tuple [ dict , float ]: \"\"\" Get the summary. :param prompt_message: The prompt message. return: The summary and the cost. \"\"\" # Get the completion for the prompt message response_string , cost = get_completion ( prompt_message , \"APPAGENT\" , use_backup_engine = True ) try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary , cost get_summary_list ( logs ) Get the summary list. Parameters: logs ( list ) \u2013 The logs. return: The summary list and the total cost. Source code in experience/summarizer.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 def get_summary_list ( self , logs : list ) -> Tuple [ list , float ]: \"\"\" Get the summary list. :param logs: The logs. return: The summary list and the total cost. \"\"\" summaries = [] total_cost = 0.0 for log_partition in logs : prompt = self . build_prompt ( log_partition ) summary , cost = self . get_summary ( prompt ) summary [ \"request\" ] = ExperienceLogLoader . get_user_request ( log_partition ) summary [ \"app_list\" ] = ExperienceLogLoader . get_app_list ( log_partition ) summaries . append ( summary ) total_cost += cost return summaries , total_cost read_logs ( log_path ) staticmethod Read the log. Parameters: log_path ( str ) \u2013 The path of the log file. Source code in experience/summarizer.py 117 118 119 120 121 122 123 124 125 @staticmethod def read_logs ( log_path : str ) -> list : \"\"\" Read the log. :param log_path: The path of the log file. \"\"\" replay_loader = ExperienceLogLoader ( log_path ) logs = replay_loader . create_logs () return logs Experience Retriever The ExperienceRetriever class is located in the ufo/rag/retriever.py file. The ExperienceRetriever class provides the following methods to retrieve the experience: Bases: Retriever Class to create experience retrievers. Create a new ExperienceRetriever. Parameters: db_path \u2013 The path to the database. Source code in rag/retriever.py 131 132 133 134 135 136 def __init__ ( self , db_path ) -> None : \"\"\" Create a new ExperienceRetriever. :param db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path ) get_indexer ( db_path ) Create an experience indexer. Parameters: db_path ( str ) \u2013 The path to the database. Source code in rag/retriever.py 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 def get_indexer ( self , db_path : str ): \"\"\" Create an experience indexer. :param db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load experience indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"Experience Learning"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#learning-from-self-experience","text":"When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future.","title":"Learning from Self-Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#mechanism","text":"","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-1-complete-a-session","text":"Event : UFO completes a session","title":"Step 1: Complete a Session"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-2-ask-user-to-save-experience","text":"Action : The agent prompts the user with a choice to save the successful experience","title":"Step 2: Ask User to Save Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-3-user-chooses-to-save","text":"Action : If the user chooses to save the experience","title":"Step 3: User Chooses to Save"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-4-summarize-and-save-the-experience","text":"Tool : ExperienceSummarizer Process : Summarize the experience into a demonstration example Save the demonstration example in the EXPERIENCE_SAVED_PATH as specified in the config_dev.yaml file The demonstration example includes similar fields as those used in the AppAgent's prompt","title":"Step 4: Summarize and Save the Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-5-retrieve-and-utilize-saved-experience","text":"When : The AppAgent encounters a similar task in the future Action : Retrieve the saved experience from the experience database Outcome : Use the retrieved experience to generate a plan","title":"Step 5: Retrieve and Utilize Saved Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#workflow-diagram","text":"graph TD; A[Complete Session] --> B[Ask User to Save Experience] B --> C[User Chooses to Save] C --> D[Summarize with ExperienceSummarizer] D --> E[Save in EXPERIENCE_SAVED_PATH] F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience] G --> H[Generate Plan]","title":"Workflow Diagram"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#activate-the-learning-from-self-experience","text":"","title":"Activate the Learning from Self-Experience"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#step-1-configure-the-appagent","text":"Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5","title":"Step 1: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#reference","text":"","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience-summarizer","text":"The ExperienceSummarizer class is located in the ufo/experience/experience_summarizer.py file. The ExperienceSummarizer class provides the following methods to summarize the experience: The ExperienceSummarizer class is the summarizer for the experience learning. Initialize the ApplicationAgentPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in experience/summarizer.py 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str , api_prompt_template : str , ): \"\"\" Initialize the ApplicationAgentPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . example_prompt_template = example_prompt_template self . api_prompt_template = api_prompt_template","title":"Experience Summarizer"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.build_prompt","text":"Build the prompt. Parameters: log_partition ( dict ) \u2013 The log partition. return: The prompt. Source code in experience/summarizer.py 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def build_prompt ( self , log_partition : dict ) -> list : \"\"\" Build the prompt. :param log_partition: The log partition. return: The prompt. \"\"\" experience_prompter = ExperiencePrompter ( self . is_visual , self . prompt_template , self . example_prompt_template , self . api_prompt_template , ) experience_system_prompt = experience_prompter . system_prompt_construction () experience_user_prompt = experience_prompter . user_content_construction ( log_partition ) experience_prompt = experience_prompter . prompt_construction ( experience_system_prompt , experience_user_prompt ) return experience_prompt","title":"build_prompt"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.create_or_update_vector_db","text":"Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in experience/summarizer.py 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" )","title":"create_or_update_vector_db"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.create_or_update_yaml","text":"Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in experience/summarizer.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" )","title":"create_or_update_yaml"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.get_summary","text":"Get the summary. Parameters: prompt_message ( list ) \u2013 The prompt message. return: The summary and the cost. Source code in experience/summarizer.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_summary ( self , prompt_message : list ) -> Tuple [ dict , float ]: \"\"\" Get the summary. :param prompt_message: The prompt message. return: The summary and the cost. \"\"\" # Get the completion for the prompt message response_string , cost = get_completion ( prompt_message , \"APPAGENT\" , use_backup_engine = True ) try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary , cost","title":"get_summary"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.get_summary_list","text":"Get the summary list. Parameters: logs ( list ) \u2013 The logs. return: The summary list and the total cost. Source code in experience/summarizer.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 def get_summary_list ( self , logs : list ) -> Tuple [ list , float ]: \"\"\" Get the summary list. :param logs: The logs. return: The summary list and the total cost. \"\"\" summaries = [] total_cost = 0.0 for log_partition in logs : prompt = self . build_prompt ( log_partition ) summary , cost = self . get_summary ( prompt ) summary [ \"request\" ] = ExperienceLogLoader . get_user_request ( log_partition ) summary [ \"app_list\" ] = ExperienceLogLoader . get_app_list ( log_partition ) summaries . append ( summary ) total_cost += cost return summaries , total_cost","title":"get_summary_list"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience.summarizer.ExperienceSummarizer.read_logs","text":"Read the log. Parameters: log_path ( str ) \u2013 The path of the log file. Source code in experience/summarizer.py 117 118 119 120 121 122 123 124 125 @staticmethod def read_logs ( log_path : str ) -> list : \"\"\" Read the log. :param log_path: The path of the log file. \"\"\" replay_loader = ExperienceLogLoader ( log_path ) logs = replay_loader . create_logs () return logs","title":"read_logs"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#experience-retriever","text":"The ExperienceRetriever class is located in the ufo/rag/retriever.py file. The ExperienceRetriever class provides the following methods to retrieve the experience: Bases: Retriever Class to create experience retrievers. Create a new ExperienceRetriever. Parameters: db_path \u2013 The path to the database. Source code in rag/retriever.py 131 132 133 134 135 136 def __init__ ( self , db_path ) -> None : \"\"\" Create a new ExperienceRetriever. :param db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path )","title":"Experience Retriever"},{"location":"advanced_usage/reinforce_appagent/experience_learning/#rag.retriever.ExperienceRetriever.get_indexer","text":"Create an experience indexer. Parameters: db_path ( str ) \u2013 The path to the database. Source code in rag/retriever.py 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 def get_indexer ( self , db_path : str ): \"\"\" Create an experience indexer. :param db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load experience indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/","text":"Learning from Bing Search UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the AppAgent 's knowledge. Mechanism Upon receiving a request, the AppAgent constructs a Bing search query based on the request and retrieves the search results from Bing. The AppAgent then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information. Activate the Learning from Bing Search Step 1: Obtain Bing API Key To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the Microsoft Azure Bing Search API to get the API key. Step 2: Configure the AppAgent Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1 Reference Bases: Retriever Class to create online retrievers. Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. Source code in rag/retriever.py 162 163 164 165 166 167 168 169 def __init__ ( self , query : str , top_k : int ) -> None : \"\"\" Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. \"\"\" self . query = query self . indexer = self . get_indexer ( top_k ) get_indexer ( top_k ) Create an online search indexer. Parameters: top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The created indexer. Source code in rag/retriever.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 def get_indexer ( self , top_k : int ): \"\"\" Create an online search indexer. :param top_k: The number of documents to retrieve. :return: The created indexer. \"\"\" bing_retriever = web_search . BingSearchWeb () result_list = bing_retriever . search ( self . query , top_k = top_k ) documents = bing_retriever . create_documents ( result_list ) if len ( documents ) == 0 : return None indexer = bing_retriever . create_indexer ( documents ) print_with_color ( \"Online indexer created successfully for {num} searched results.\" . format ( num = len ( documents ) ), \"cyan\" , ) return indexer","title":"Learning from Bing Search"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#learning-from-bing-search","text":"UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the AppAgent 's knowledge.","title":"Learning from Bing Search"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#mechanism","text":"Upon receiving a request, the AppAgent constructs a Bing search query based on the request and retrieves the search results from Bing. The AppAgent then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information.","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#activate-the-learning-from-bing-search","text":"","title":"Activate the Learning from Bing Search"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#step-1-obtain-bing-api-key","text":"To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the Microsoft Azure Bing Search API to get the API key.","title":"Step 1: Obtain Bing API Key"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#step-2-configure-the-appagent","text":"Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1","title":"Step 2: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#reference","text":"Bases: Retriever Class to create online retrievers. Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. Source code in rag/retriever.py 162 163 164 165 166 167 168 169 def __init__ ( self , query : str , top_k : int ) -> None : \"\"\" Create a new OfflineDocRetriever. :query: The query to create an indexer for. :top_k: The number of documents to retrieve. \"\"\" self . query = query self . indexer = self . get_indexer ( top_k )","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/learning_from_bing_search/#rag.retriever.OnlineDocRetriever.get_indexer","text":"Create an online search indexer. Parameters: top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The created indexer. Source code in rag/retriever.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 def get_indexer ( self , top_k : int ): \"\"\" Create an online search indexer. :param top_k: The number of documents to retrieve. :return: The created indexer. \"\"\" bing_retriever = web_search . BingSearchWeb () result_list = bing_retriever . search ( self . query , top_k = top_k ) documents = bing_retriever . create_documents ( result_list ) if len ( documents ) == 0 : return None indexer = bing_retriever . create_indexer ( documents ) print_with_color ( \"Online indexer created successfully for {num} searched results.\" . format ( num = len ( documents ) ), \"cyan\" , ) return indexer","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/","text":"Here is the polished document for your Python code project: Learning from User Demonstration For complex tasks, users can demonstrate the task using Step Recorder to record the action trajectories. UFO can learn from these user demonstrations to improve the AppAgent's performance. Mechanism UFO use the Step Recorder tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The DemonstrationSummarizer class extracts and summarizes the demonstration. The summarized demonstration is saved in the DEMONSTRATION_SAVED_PATH as specified in the config_dev.yaml file. When the AppAgent encounters a similar task, the DemonstrationRetriever class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration. Info You can find how to record the task and action trajectories using the Step Recorder tool in the User Demonstration Provision document. You can find a demo video of learning from user demonstrations: Activating Learning from User Demonstrations Step 1: User Demonstration Please follow the steps in the User Demonstration Provision document to provide user demonstrations. Step 2: Configure the AppAgent Configure the following parameters to allow UFO to use RAG from user demonstrations: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use RAG from user demonstrations Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The top K documents to retrieve offline Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3 Reference Demonstration Summarizer The DemonstrationSummarizer class is located in the record_processor/summarizer/summarizer.py file. The DemonstrationSummarizer class provides methods to summarize the demonstration: The DemonstrationSummarizer class is the summarizer for the demonstration learning. It summarizes the demonstration record to a list of summaries, and save the summaries to the YAML file and the vector database. A sample of the summary is as follows: { \"example\": { \"Observation\": \"Word.exe is opened.\", \"Thought\": \"The user is trying to create a new file.\", \"ControlLabel\": \"1\", \"ControlText\": \"Sample Control Text\", \"Function\": \"CreateFile\", \"Args\": \"filename='new_file.txt'\", \"Status\": \"Success\", \"Plan\": \"Create a new file named 'new_file.txt'.\", \"Comment\": \"The user successfully created a new file.\" }, \"Tips\": \"You can use the 'CreateFile' function to create a new file.\" } Initialize the DemonstrationSummarizer. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. demonstration_prompt_template ( str ) \u2013 The path of the example prompt template for demonstration. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in summarizer/summarizer.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 def __init__ ( self , is_visual : bool , prompt_template : str , demonstration_prompt_template : str , api_prompt_template : str , completion_num : int = 1 , ): \"\"\" Initialize the DemonstrationSummarizer. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param demonstration_prompt_template: The path of the example prompt template for demonstration. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . demonstration_prompt_template = demonstration_prompt_template self . api_prompt_template = api_prompt_template self . completion_num = completion_num __build_prompt ( demo_record ) Build the prompt by the user demonstration record. Parameters: demo_record ( DemonstrationRecord ) \u2013 The user demonstration record. return: The prompt. Source code in summarizer/summarizer.py 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 def __build_prompt ( self , demo_record : DemonstrationRecord ) -> list : \"\"\" Build the prompt by the user demonstration record. :param demo_record: The user demonstration record. return: The prompt. \"\"\" demonstration_prompter = DemonstrationPrompter ( self . is_visual , self . prompt_template , self . demonstration_prompt_template , self . api_prompt_template , ) demonstration_system_prompt = ( demonstration_prompter . system_prompt_construction () ) demonstration_user_prompt = demonstration_prompter . user_content_construction ( demo_record ) demonstration_prompt = demonstration_prompter . prompt_construction ( demonstration_system_prompt , demonstration_user_prompt ) return demonstration_prompt __parse_response ( response_string ) Parse the response string to a dict of summary. Parameters: response_string ( str ) \u2013 The response string. return: The summary dict. Source code in summarizer/summarizer.py 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 def __parse_response ( self , response_string : str ) -> dict : \"\"\" Parse the response string to a dict of summary. :param response_string: The response string. return: The summary dict. \"\"\" try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response, in case any of the keys are missing, set them to empty string. if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary create_or_update_vector_db ( summaries , db_path ) staticmethod Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in summarizer/summarizer.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" ) create_or_update_yaml ( summaries , yaml_path ) staticmethod Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in summarizer/summarizer.py 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" ) get_summary_list ( record ) Get the summary list for a record Parameters: record ( DemonstrationRecord ) \u2013 The demonstration record. return: The summary list for the user defined completion number and the cost Source code in summarizer/summarizer.py 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 def get_summary_list ( self , record : DemonstrationRecord ) -> Tuple [ list , float ]: \"\"\" Get the summary list for a record :param record: The demonstration record. return: The summary list for the user defined completion number and the cost \"\"\" prompt = self . __build_prompt ( record ) response_string_list , cost = get_completions ( prompt , \"APPAGENT\" , use_backup_engine = True , n = self . completion_num ) summaries = [] for response_string in response_string_list : summary = self . __parse_response ( response_string ) if summary : summary [ \"request\" ] = record . get_request () summary [ \"app_list\" ] = record . get_applications () summaries . append ( summary ) return summaries , cost Demonstration Retriever The DemonstrationRetriever class is located in the rag/retriever.py file. The DemonstrationRetriever class provides methods to retrieve the demonstration: Bases: Retriever Class to create demonstration retrievers. Create a new DemonstrationRetriever. :db_path: The path to the database. Source code in rag/retriever.py 198 199 200 201 202 203 def __init__ ( self , db_path ) -> None : \"\"\" Create a new DemonstrationRetriever. :db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path ) get_indexer ( db_path ) Create a demonstration indexer. :db_path: The path to the database. Source code in rag/retriever.py 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 def get_indexer ( self , db_path : str ): \"\"\" Create a demonstration indexer. :db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load demonstration indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"Learning from User Demonstration"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#learning-from-user-demonstration","text":"For complex tasks, users can demonstrate the task using Step Recorder to record the action trajectories. UFO can learn from these user demonstrations to improve the AppAgent's performance.","title":"Learning from User Demonstration"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#mechanism","text":"UFO use the Step Recorder tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The DemonstrationSummarizer class extracts and summarizes the demonstration. The summarized demonstration is saved in the DEMONSTRATION_SAVED_PATH as specified in the config_dev.yaml file. When the AppAgent encounters a similar task, the DemonstrationRetriever class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration. Info You can find how to record the task and action trajectories using the Step Recorder tool in the User Demonstration Provision document. You can find a demo video of learning from user demonstrations:","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#activating-learning-from-user-demonstrations","text":"","title":"Activating Learning from User Demonstrations"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#step-1-user-demonstration","text":"Please follow the steps in the User Demonstration Provision document to provide user demonstrations.","title":"Step 1: User Demonstration"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#step-2-configure-the-appagent","text":"Configure the following parameters to allow UFO to use RAG from user demonstrations: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use RAG from user demonstrations Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The top K documents to retrieve offline Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3","title":"Step 2: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#reference","text":"","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#demonstration-summarizer","text":"The DemonstrationSummarizer class is located in the record_processor/summarizer/summarizer.py file. The DemonstrationSummarizer class provides methods to summarize the demonstration: The DemonstrationSummarizer class is the summarizer for the demonstration learning. It summarizes the demonstration record to a list of summaries, and save the summaries to the YAML file and the vector database. A sample of the summary is as follows: { \"example\": { \"Observation\": \"Word.exe is opened.\", \"Thought\": \"The user is trying to create a new file.\", \"ControlLabel\": \"1\", \"ControlText\": \"Sample Control Text\", \"Function\": \"CreateFile\", \"Args\": \"filename='new_file.txt'\", \"Status\": \"Success\", \"Plan\": \"Create a new file named 'new_file.txt'.\", \"Comment\": \"The user successfully created a new file.\" }, \"Tips\": \"You can use the 'CreateFile' function to create a new file.\" } Initialize the DemonstrationSummarizer. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. demonstration_prompt_template ( str ) \u2013 The path of the example prompt template for demonstration. api_prompt_template ( str ) \u2013 The path of the api prompt template. Source code in summarizer/summarizer.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 def __init__ ( self , is_visual : bool , prompt_template : str , demonstration_prompt_template : str , api_prompt_template : str , completion_num : int = 1 , ): \"\"\" Initialize the DemonstrationSummarizer. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param demonstration_prompt_template: The path of the example prompt template for demonstration. :param api_prompt_template: The path of the api prompt template. \"\"\" self . is_visual = is_visual self . prompt_template = prompt_template self . demonstration_prompt_template = demonstration_prompt_template self . api_prompt_template = api_prompt_template self . completion_num = completion_num","title":"Demonstration Summarizer"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.__build_prompt","text":"Build the prompt by the user demonstration record. Parameters: demo_record ( DemonstrationRecord ) \u2013 The user demonstration record. return: The prompt. Source code in summarizer/summarizer.py 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 def __build_prompt ( self , demo_record : DemonstrationRecord ) -> list : \"\"\" Build the prompt by the user demonstration record. :param demo_record: The user demonstration record. return: The prompt. \"\"\" demonstration_prompter = DemonstrationPrompter ( self . is_visual , self . prompt_template , self . demonstration_prompt_template , self . api_prompt_template , ) demonstration_system_prompt = ( demonstration_prompter . system_prompt_construction () ) demonstration_user_prompt = demonstration_prompter . user_content_construction ( demo_record ) demonstration_prompt = demonstration_prompter . prompt_construction ( demonstration_system_prompt , demonstration_user_prompt ) return demonstration_prompt","title":"__build_prompt"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.__parse_response","text":"Parse the response string to a dict of summary. Parameters: response_string ( str ) \u2013 The response string. return: The summary dict. Source code in summarizer/summarizer.py 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 def __parse_response ( self , response_string : str ) -> dict : \"\"\" Parse the response string to a dict of summary. :param response_string: The response string. return: The summary dict. \"\"\" try : response_json = json_parser ( response_string ) except : response_json = None # Restructure the response, in case any of the keys are missing, set them to empty string. if response_json : summary = dict () summary [ \"example\" ] = {} for key in [ \"Observation\" , \"Thought\" , \"ControlLabel\" , \"ControlText\" , \"Function\" , \"Args\" , \"Status\" , \"Plan\" , \"Comment\" , ]: summary [ \"example\" ][ key ] = response_json . get ( key , \"\" ) summary [ \"Tips\" ] = response_json . get ( \"Tips\" , \"\" ) return summary","title":"__parse_response"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.create_or_update_vector_db","text":"Create or update the vector database. Parameters: summaries ( list ) \u2013 The summaries. db_path ( str ) \u2013 The path of the vector database. Source code in summarizer/summarizer.py 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 @staticmethod def create_or_update_vector_db ( summaries : list , db_path : str ): \"\"\" Create or update the vector database. :param summaries: The summaries. :param db_path: The path of the vector database. \"\"\" document_list = [] for summary in summaries : request = summary [ \"request\" ] document_list . append ( Document ( page_content = request , metadata = summary )) db = FAISS . from_documents ( document_list , get_hugginface_embedding ()) # Check if the db exists, if not, create a new one. if os . path . exists ( db_path ): prev_db = FAISS . load_local ( db_path , get_hugginface_embedding ()) db . merge_from ( prev_db ) db . save_local ( db_path ) print ( f \"Updated vector DB successfully: { db_path } \" )","title":"create_or_update_vector_db"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.create_or_update_yaml","text":"Create or update the YAML file. Parameters: summaries ( list ) \u2013 The summaries. yaml_path ( str ) \u2013 The path of the YAML file. Source code in summarizer/summarizer.py 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 @staticmethod def create_or_update_yaml ( summaries : list , yaml_path : str ): \"\"\" Create or update the YAML file. :param summaries: The summaries. :param yaml_path: The path of the YAML file. \"\"\" # Check if the file exists, if not, create a new one if not os . path . exists ( yaml_path ): with open ( yaml_path , \"w\" ): pass print ( f \"Created new YAML file: { yaml_path } \" ) # Read existing data from the YAML file with open ( yaml_path , \"r\" ) as file : existing_data = yaml . safe_load ( file ) # Initialize index and existing_data if file is empty index = len ( existing_data ) if existing_data else 0 existing_data = existing_data or {} # Update data with new summaries for i , summary in enumerate ( summaries ): example = { f \"example { index + i } \" : summary } existing_data . update ( example ) # Write updated data back to the YAML file with open ( yaml_path , \"w\" ) as file : yaml . safe_dump ( existing_data , file , default_flow_style = False , sort_keys = False ) print ( f \"Updated existing YAML file successfully: { yaml_path } \" )","title":"create_or_update_yaml"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#summarizer.summarizer.DemonstrationSummarizer.get_summary_list","text":"Get the summary list for a record Parameters: record ( DemonstrationRecord ) \u2013 The demonstration record. return: The summary list for the user defined completion number and the cost Source code in summarizer/summarizer.py 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 def get_summary_list ( self , record : DemonstrationRecord ) -> Tuple [ list , float ]: \"\"\" Get the summary list for a record :param record: The demonstration record. return: The summary list for the user defined completion number and the cost \"\"\" prompt = self . __build_prompt ( record ) response_string_list , cost = get_completions ( prompt , \"APPAGENT\" , use_backup_engine = True , n = self . completion_num ) summaries = [] for response_string in response_string_list : summary = self . __parse_response ( response_string ) if summary : summary [ \"request\" ] = record . get_request () summary [ \"app_list\" ] = record . get_applications () summaries . append ( summary ) return summaries , cost","title":"get_summary_list"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#demonstration-retriever","text":"The DemonstrationRetriever class is located in the rag/retriever.py file. The DemonstrationRetriever class provides methods to retrieve the demonstration: Bases: Retriever Class to create demonstration retrievers. Create a new DemonstrationRetriever. :db_path: The path to the database. Source code in rag/retriever.py 198 199 200 201 202 203 def __init__ ( self , db_path ) -> None : \"\"\" Create a new DemonstrationRetriever. :db_path: The path to the database. \"\"\" self . indexer = self . get_indexer ( db_path )","title":"Demonstration Retriever"},{"location":"advanced_usage/reinforce_appagent/learning_from_demonstration/#rag.retriever.DemonstrationRetriever.get_indexer","text":"Create a demonstration indexer. :db_path: The path to the database. Source code in rag/retriever.py 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 def get_indexer ( self , db_path : str ): \"\"\" Create a demonstration indexer. :db_path: The path to the database. \"\"\" try : db = FAISS . load_local ( db_path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load demonstration indexer from {path}.\".format( # path=db_path # ), # \"yellow\", # ) return None","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/","text":"Learning from Help Documents User or applications can provide help documents to the AppAgent to reinforce its capabilities. The AppAgent can retrieve knowledge from these documents to improve its understanding of the task, generate high-quality plans, and interact more efficiently with the application. You can find how to provide help documents to the AppAgent in the Help Document Provision section. Mechanism The help documents are provided in a format of task-solution pairs . Upon receiving a request, the AppAgent retrieves the relevant help documents by matching the request with the task descriptions in the help documents and generates a plan based on the retrieved solutions. Note Since the retrieved help documents may not be relevant to the request, the AppAgent will only take them as references to generate the plan. Activate the Learning from Help Documents Follow the steps below to activate the learning from help documents: Step 1: Provide Help Documents Please follow the steps in the Help Document Provision document to provide help documents to the AppAgent. Step 2: Configure the AppAgent Configure the following parameters in the config.yaml file to activate the learning from help documents: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1 Reference Bases: Retriever Class to create offline retrievers. Create a new OfflineDocRetriever. :appname: The name of the application. Source code in rag/retriever.py 78 79 80 81 82 83 84 85 def __init__ ( self , app_name : str ) -> None : \"\"\" Create a new OfflineDocRetriever. :appname: The name of the application. \"\"\" self . app_name = app_name indexer_path = self . get_offline_indexer_path () self . indexer = self . get_indexer ( indexer_path ) get_indexer ( path ) Load the retriever. Parameters: path ( str ) \u2013 The path to load the retriever from. Returns: \u2013 The loaded retriever. Source code in rag/retriever.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 def get_indexer ( self , path : str ): \"\"\" Load the retriever. :param path: The path to load the retriever from. :return: The loaded retriever. \"\"\" if path : print_with_color ( \"Loading offline indexer from {path} ...\" . format ( path = path ), \"cyan\" ) else : return None try : db = FAISS . load_local ( path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load offline indexer from {path}.\".format( # path=path # ), # \"yellow\", # ) return None get_offline_indexer_path () Get the path to the offline indexer. Returns: \u2013 The path to the offline indexer. Source code in rag/retriever.py 87 88 89 90 91 92 93 94 95 96 97 def get_offline_indexer_path ( self ): \"\"\" Get the path to the offline indexer. :return: The path to the offline indexer. \"\"\" offline_records = get_offline_learner_indexer_config () for key in offline_records : if key . lower () in self . app_name . lower (): return offline_records [ key ] return None","title":"Learning from Help Document"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#learning-from-help-documents","text":"User or applications can provide help documents to the AppAgent to reinforce its capabilities. The AppAgent can retrieve knowledge from these documents to improve its understanding of the task, generate high-quality plans, and interact more efficiently with the application. You can find how to provide help documents to the AppAgent in the Help Document Provision section.","title":"Learning from Help Documents"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#mechanism","text":"The help documents are provided in a format of task-solution pairs . Upon receiving a request, the AppAgent retrieves the relevant help documents by matching the request with the task descriptions in the help documents and generates a plan based on the retrieved solutions. Note Since the retrieved help documents may not be relevant to the request, the AppAgent will only take them as references to generate the plan.","title":"Mechanism"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#activate-the-learning-from-help-documents","text":"Follow the steps below to activate the learning from help documents:","title":"Activate the Learning from Help Documents"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#step-1-provide-help-documents","text":"Please follow the steps in the Help Document Provision document to provide help documents to the AppAgent.","title":"Step 1: Provide Help Documents"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#step-2-configure-the-appagent","text":"Configure the following parameters in the config.yaml file to activate the learning from help documents: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1","title":"Step 2: Configure the AppAgent"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#reference","text":"Bases: Retriever Class to create offline retrievers. Create a new OfflineDocRetriever. :appname: The name of the application. Source code in rag/retriever.py 78 79 80 81 82 83 84 85 def __init__ ( self , app_name : str ) -> None : \"\"\" Create a new OfflineDocRetriever. :appname: The name of the application. \"\"\" self . app_name = app_name indexer_path = self . get_offline_indexer_path () self . indexer = self . get_indexer ( indexer_path )","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#rag.retriever.OfflineDocRetriever.get_indexer","text":"Load the retriever. Parameters: path ( str ) \u2013 The path to load the retriever from. Returns: \u2013 The loaded retriever. Source code in rag/retriever.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 def get_indexer ( self , path : str ): \"\"\" Load the retriever. :param path: The path to load the retriever from. :return: The loaded retriever. \"\"\" if path : print_with_color ( \"Loading offline indexer from {path} ...\" . format ( path = path ), \"cyan\" ) else : return None try : db = FAISS . load_local ( path , get_hugginface_embedding ()) return db except : # print_with_color( # \"Warning: Failed to load offline indexer from {path}.\".format( # path=path # ), # \"yellow\", # ) return None","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/learning_from_help_document/#rag.retriever.OfflineDocRetriever.get_offline_indexer_path","text":"Get the path to the offline indexer. Returns: \u2013 The path to the offline indexer. Source code in rag/retriever.py 87 88 89 90 91 92 93 94 95 96 97 def get_offline_indexer_path ( self ): \"\"\" Get the path to the offline indexer. :return: The path to the offline indexer. \"\"\" offline_records = get_offline_learner_indexer_config () for key in offline_records : if key . lower () in self . app_name . lower (): return offline_records [ key ] return None","title":"get_offline_indexer_path"},{"location":"advanced_usage/reinforce_appagent/overview/","text":"Reinforcing AppAgent UFO provides versatile mechanisms to reinforce the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other techniques. These enhance the AppAgent's understanding of the task, improving the quality of the generated plans, and increasing the efficiency of the AppAgent's interactions with the application. We currently support the following reinforcement methods: Reinforcement Method Description Learning from Help Documents Reinforce the AppAgent by retrieving knowledge from help documents. Learning from Bing Search Reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge. Learning from Self-Experience Reinforce the AppAgent by learning from its own successful experiences. Learning from User Demonstrations Reinforce the AppAgent by learning from action trajectories demonstrated by users. Knowledge Provision UFO provides the knowledge to the AppAgent through a context_provision method defined in the AppAgent class: def context_provision(self, request: str = \"\") -> None: \"\"\" Provision the context for the app agent. :param request: The Bing search query. \"\"\" # Load the offline document indexer for the app agent if available. if configs[\"RAG_OFFLINE_DOCS\"]: utils.print_with_color( \"Loading offline help document indexer for {app}...\".format( app=self._process_name ), \"magenta\", ) self.build_offline_docs_retriever() # Load the online search indexer for the app agent if available. if configs[\"RAG_ONLINE_SEARCH\"] and request: utils.print_with_color(\"Creating a Bing search indexer...\", \"magenta\") self.build_online_search_retriever( request, configs[\"RAG_ONLINE_SEARCH_TOPK\"] ) # Load the experience indexer for the app agent if available. if configs[\"RAG_EXPERIENCE\"]: utils.print_with_color(\"Creating an experience indexer...\", \"magenta\") experience_path = configs[\"EXPERIENCE_SAVED_PATH\"] db_path = os.path.join(experience_path, \"experience_db\") self.build_experience_retriever(db_path) # Load the demonstration indexer for the app agent if available. if configs[\"RAG_DEMONSTRATION\"]: utils.print_with_color(\"Creating an demonstration indexer...\", \"magenta\") demonstration_path = configs[\"DEMONSTRATION_SAVED_PATH\"] db_path = os.path.join(demonstration_path, \"demonstration_db\") self.build_human_demonstration_retriever(db_path) The context_provision method loads the offline document indexer, online search indexer, experience indexer, and demonstration indexer for the AppAgent based on the configuration settings in the config_dev.yaml file. Reference UFO employs the Retriever class located in the ufo/rag/retriever.py file to retrieve knowledge from various sources. The Retriever class provides the following methods to retrieve knowledge: Bases: ABC Class to retrieve documents. Create a new Retriever. Source code in rag/retriever.py 42 43 44 45 46 47 48 49 def __init__ ( self ) -> None : \"\"\" Create a new Retriever. \"\"\" self . indexer = self . get_indexer () pass get_indexer () abstractmethod Get the indexer. Returns: \u2013 The indexer. Source code in rag/retriever.py 51 52 53 54 55 56 57 @abstractmethod def get_indexer ( self ): \"\"\" Get the indexer. :return: The indexer. \"\"\" pass retrieve ( query , top_k , filter = None ) Retrieve the document from the given query. :filter: The filter to apply to the retrieved documents. Parameters: query ( str ) \u2013 The query to retrieve the document from. top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The document from the given query. Source code in rag/retriever.py 59 60 61 62 63 64 65 66 67 68 69 70 def retrieve ( self , query : str , top_k : int , filter = None ): \"\"\" Retrieve the document from the given query. :param query: The query to retrieve the document from. :param top_k: The number of documents to retrieve. :filter: The filter to apply to the retrieved documents. :return: The document from the given query. \"\"\" if not self . indexer : return None return self . indexer . similarity_search ( query , top_k , filter = filter )","title":"Overview"},{"location":"advanced_usage/reinforce_appagent/overview/#reinforcing-appagent","text":"UFO provides versatile mechanisms to reinforce the AppAgent's capabilities through RAG (Retrieval-Augmented Generation) and other techniques. These enhance the AppAgent's understanding of the task, improving the quality of the generated plans, and increasing the efficiency of the AppAgent's interactions with the application. We currently support the following reinforcement methods: Reinforcement Method Description Learning from Help Documents Reinforce the AppAgent by retrieving knowledge from help documents. Learning from Bing Search Reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge. Learning from Self-Experience Reinforce the AppAgent by learning from its own successful experiences. Learning from User Demonstrations Reinforce the AppAgent by learning from action trajectories demonstrated by users.","title":"Reinforcing AppAgent"},{"location":"advanced_usage/reinforce_appagent/overview/#knowledge-provision","text":"UFO provides the knowledge to the AppAgent through a context_provision method defined in the AppAgent class: def context_provision(self, request: str = \"\") -> None: \"\"\" Provision the context for the app agent. :param request: The Bing search query. \"\"\" # Load the offline document indexer for the app agent if available. if configs[\"RAG_OFFLINE_DOCS\"]: utils.print_with_color( \"Loading offline help document indexer for {app}...\".format( app=self._process_name ), \"magenta\", ) self.build_offline_docs_retriever() # Load the online search indexer for the app agent if available. if configs[\"RAG_ONLINE_SEARCH\"] and request: utils.print_with_color(\"Creating a Bing search indexer...\", \"magenta\") self.build_online_search_retriever( request, configs[\"RAG_ONLINE_SEARCH_TOPK\"] ) # Load the experience indexer for the app agent if available. if configs[\"RAG_EXPERIENCE\"]: utils.print_with_color(\"Creating an experience indexer...\", \"magenta\") experience_path = configs[\"EXPERIENCE_SAVED_PATH\"] db_path = os.path.join(experience_path, \"experience_db\") self.build_experience_retriever(db_path) # Load the demonstration indexer for the app agent if available. if configs[\"RAG_DEMONSTRATION\"]: utils.print_with_color(\"Creating an demonstration indexer...\", \"magenta\") demonstration_path = configs[\"DEMONSTRATION_SAVED_PATH\"] db_path = os.path.join(demonstration_path, \"demonstration_db\") self.build_human_demonstration_retriever(db_path) The context_provision method loads the offline document indexer, online search indexer, experience indexer, and demonstration indexer for the AppAgent based on the configuration settings in the config_dev.yaml file.","title":"Knowledge Provision"},{"location":"advanced_usage/reinforce_appagent/overview/#reference","text":"UFO employs the Retriever class located in the ufo/rag/retriever.py file to retrieve knowledge from various sources. The Retriever class provides the following methods to retrieve knowledge: Bases: ABC Class to retrieve documents. Create a new Retriever. Source code in rag/retriever.py 42 43 44 45 46 47 48 49 def __init__ ( self ) -> None : \"\"\" Create a new Retriever. \"\"\" self . indexer = self . get_indexer () pass","title":"Reference"},{"location":"advanced_usage/reinforce_appagent/overview/#rag.retriever.Retriever.get_indexer","text":"Get the indexer. Returns: \u2013 The indexer. Source code in rag/retriever.py 51 52 53 54 55 56 57 @abstractmethod def get_indexer ( self ): \"\"\" Get the indexer. :return: The indexer. \"\"\" pass","title":"get_indexer"},{"location":"advanced_usage/reinforce_appagent/overview/#rag.retriever.Retriever.retrieve","text":"Retrieve the document from the given query. :filter: The filter to apply to the retrieved documents. Parameters: query ( str ) \u2013 The query to retrieve the document from. top_k ( int ) \u2013 The number of documents to retrieve. Returns: \u2013 The document from the given query. Source code in rag/retriever.py 59 60 61 62 63 64 65 66 67 68 69 70 def retrieve ( self , query : str , top_k : int , filter = None ): \"\"\" Retrieve the document from the given query. :param query: The query to retrieve the document from. :param top_k: The number of documents to retrieve. :filter: The filter to apply to the retrieved documents. :return: The document from the given query. \"\"\" if not self . indexer : return None return self . indexer . similarity_search ( query , top_k , filter = filter )","title":"retrieve"},{"location":"agents/app_agent/","text":"AppAgent \ud83d\udc7e An AppAgent is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The AppAgent is created by the HostAgent to fulfill a sub-task within a Round . The AppAgent is responsible for executing the necessary actions within the application to fulfill the user's request. The AppAgent has the following features: ReAct with the Application - The AppAgent recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request. Comprehension Enhancement - The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application \"expert\". Versatile Skill Set - The AppAgent is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and \"Copilot\". Tip You can find the how to enhance the AppAgent with external knowledge bases and demonstration libraries in the Reinforcing AppAgent documentation. We show the framework of the AppAgent in the following diagram: AppAgent Input To interact with the application, the AppAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Sub-Task The sub-task description to be executed by the AppAgent , assigned by the HostAgent . String Current Application The name of the application to be interacted with. String Control Information Index, name and control type of available controls in the application. List of Dictionaries Application Screenshots Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). List of Strings Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following steps. List of Strings HostAgent Message The message from the HostAgent for the completion of the sub-task. String Retrived Information The retrieved information from external knowledge bases or demonstration libraries. String Blackboard The shared memory space for storing and sharing information among the agents. Dictionary Below is an example of the annotated application screenshot with labeled controls. This follow the Set-of-Mark paradigm. By processing these inputs, the AppAgent determines the necessary actions to fulfill the user's request within the application. Tip Whether to concatenate the clean screenshot and annotated screenshot can be configured in the CONCAT_SCREENSHOT field in the config_dev.yaml file. Tip Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the INCLUDE_LAST_SCREENSHOT field in the config_dev.yaml file. AppAgent Output With the inputs provided, the AppAgent generates the following outputs: Output Description Type Observation The observation of the current application screenshots. String Thought The logical reasoning process of the AppAgent . String ControlLabel The index of the selected control to interact with. String ControlText The name of the selected control to interact with. String Function The function to be executed on the selected control. String Args The arguments required for the function execution. List of Strings Status The status of the agent, mapped to the AgentState . String Plan The plan for the following steps after the current action. List of Strings Comment Additional comments or information provided to the user. String SaveScreenshot The flag to save the screenshot of the application to the blackboard for future reference. Boolean Below is an example of the AppAgent output: { \"Observation\": \"Application screenshot\", \"Thought\": \"Logical reasoning process\", \"ControlLabel\": \"Control index\", \"ControlText\": \"Control name\", \"Function\": \"Function name\", \"Args\": [\"arg1\", \"arg2\"], \"Status\": \"AgentState\", \"Plan\": [\"Step 1\", \"Step 2\"], \"Comment\": \"Additional comments\", \"SaveScreenshot\": true } Info The AppAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python. AppAgent State The AppAgent state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the ufo/agents/states/app_agent_states.py module. The states include: State Description CONTINUE The AppAgent continues executing the current action. FINISH The AppAgent has completed the current sub-task. ERROR The AppAgent encountered an error during execution. FAIL The AppAgent believes the current sub-task is unachievable. CONFIRM The AppAgent is confirming the user's input or action. SCREENSHOT The AppAgent believes the current screenshot is not clear in annotating the control and requests a new screenshot. The state machine diagram for the AppAgent is shown below: The AppAgent progresses through these states to execute the necessary actions within the application and fulfill the sub-task assigned by the HostAgent . Knowledge Enhancement The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The AppAgent leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance. Learning from Help Documents User can provide help documents to the AppAgent to enhance its comprehension of the application and improve its performance in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for how to provide help documents to the AppAgent . In the AppAgent , it calls the build_offline_docs_retriever to build a help document retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent . Learning from Bing Search Since help documents may not cover all the information or the information may be outdated, the AppAgent can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for the implementation of Bing search in the AppAgent . In the AppAgent , it calls the build_online_search_retriever to build a Bing search retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent . Learning from Self-Demonstrations You may save successful action trajectories in the AppAgent to learn from self-demonstrations and improve its performance. After the completion of a session , the AppAgent will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of self-demonstrations in the AppAgent . In the AppAgent , it calls the build_experience_retriever to build a self-demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent . Learning from Human Demonstrations In addition to self-demonstrations, you can also provide human demonstrations to the AppAgent to enhance its performance by using the Step Recorder tool built in the Windows OS. The AppAgent will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of human demonstrations in the AppAgent . In the AppAgent , it calls the build_human_demonstration_retriever to build a human demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent . Skill Set for Automation The AppAgent is equipped with a versatile skill set to support comprehensive automation within the application by calling the create_puppeteer_interface method. The skills include: Skill Description UI Automation Mimicking user interactions with the application UI controls using the UI Automation and Win32 API. Native API Accessing the application's native API to execute specific functions and actions. In-App Agent Leveraging the in-app agent to interact with the application's internal functions and features. By utilizing these skills, the AppAgent can efficiently interact with the application and fulfill the user's request. You can find more details in the Automator documentation and the code in the ufo/automator module. Reference Bases: BasicAgent The AppAgent class that manages the interaction with the application. Initialize the AppAgent. :name: The name of the agent. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. skip_prompter ( bool , default: False ) \u2013 The flag indicating whether to skip the prompter initialization. Source code in agents/agent/app_agent.py 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , skip_prompter : bool = False , ) -> None : \"\"\" Initialize the AppAgent. :name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param skip_prompter: The flag indicating whether to skip the prompter initialization. \"\"\" super () . __init__ ( name = name ) if not skip_prompter : self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) self . _process_name = process_name self . _app_root_name = app_root_name self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . Puppeteer = self . create_puppeteer_interface () self . set_state ( ContinueAppAgentState ()) status_manager : AppAgentStatus property Get the status manager. build_experience_retriever ( db_path ) Build the experience retriever. Parameters: db_path ( str ) \u2013 The path to the experience database. Returns: None \u2013 The experience retriever. Source code in agents/agent/app_agent.py 346 347 348 349 350 351 352 353 354 def build_experience_retriever ( self , db_path : str ) -> None : \"\"\" Build the experience retriever. :param db_path: The path to the experience database. :return: The experience retriever. \"\"\" self . experience_retriever = self . retriever_factory . create_retriever ( \"experience\" , db_path ) build_human_demonstration_retriever ( db_path ) Build the human demonstration retriever. Parameters: db_path ( str ) \u2013 The path to the human demonstration database. Returns: None \u2013 The human demonstration retriever. Source code in agents/agent/app_agent.py 356 357 358 359 360 361 362 363 364 def build_human_demonstration_retriever ( self , db_path : str ) -> None : \"\"\" Build the human demonstration retriever. :param db_path: The path to the human demonstration database. :return: The human demonstration retriever. \"\"\" self . human_demonstration_retriever = self . retriever_factory . create_retriever ( \"demonstration\" , db_path ) build_offline_docs_retriever () Build the offline docs retriever. Source code in agents/agent/app_agent.py 328 329 330 331 332 333 334 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" self . offline_doc_retriever = self . retriever_factory . create_retriever ( \"offline\" , self . _app_root_name ) build_online_search_retriever ( request , top_k ) Build the online search retriever. Parameters: request ( str ) \u2013 The request for online Bing search. top_k ( int ) \u2013 The number of documents to retrieve. Source code in agents/agent/app_agent.py 336 337 338 339 340 341 342 343 344 def build_online_search_retriever ( self , request : str , top_k : int ) -> None : \"\"\" Build the online search retriever. :param request: The request for online Bing search. :param top_k: The number of documents to retrieve. \"\"\" self . online_doc_retriever = self . retriever_factory . create_retriever ( \"online\" , request , top_k ) context_provision ( request = '' ) Provision the context for the app agent. Parameters: request ( str , default: '' ) \u2013 The request sent to the Bing search retriever. Source code in agents/agent/app_agent.py 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 def context_provision ( self , request : str = \"\" ) -> None : \"\"\" Provision the context for the app agent. :param request: The request sent to the Bing search retriever. \"\"\" # Load the offline document indexer for the app agent if available. if configs [ \"RAG_OFFLINE_DOCS\" ]: utils . print_with_color ( \"Loading offline help document indexer for {app} ...\" . format ( app = self . _process_name ), \"magenta\" , ) self . build_offline_docs_retriever () # Load the online search indexer for the app agent if available. if configs [ \"RAG_ONLINE_SEARCH\" ] and request : utils . print_with_color ( \"Creating a Bing search indexer...\" , \"magenta\" ) self . build_online_search_retriever ( request , configs [ \"RAG_ONLINE_SEARCH_TOPK\" ] ) # Load the experience indexer for the app agent if available. if configs [ \"RAG_EXPERIENCE\" ]: utils . print_with_color ( \"Creating an experience indexer...\" , \"magenta\" ) experience_path = configs [ \"EXPERIENCE_SAVED_PATH\" ] db_path = os . path . join ( experience_path , \"experience_db\" ) self . build_experience_retriever ( db_path ) # Load the demonstration indexer for the app agent if available. if configs [ \"RAG_DEMONSTRATION\" ]: utils . print_with_color ( \"Creating an demonstration indexer...\" , \"magenta\" ) demonstration_path = configs [ \"DEMONSTRATION_SAVED_PATH\" ] db_path = os . path . join ( demonstration_path , \"demonstration_db\" ) self . build_human_demonstration_retriever ( db_path ) create_puppeteer_interface () Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/app_agent.py 299 300 301 302 303 304 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( self . _process_name , self . _app_root_name ) external_knowledge_prompt_helper ( request , offline_top_k , online_top_k ) Retrieve the external knowledge and construct the prompt. Parameters: request ( str ) \u2013 The request. offline_top_k ( int ) \u2013 The number of offline documents to retrieve. online_top_k ( int ) \u2013 The number of online documents to retrieve. Returns: str \u2013 The prompt message for the external_knowledge. Source code in agents/agent/app_agent.py 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 def external_knowledge_prompt_helper ( self , request : str , offline_top_k : int , online_top_k : int ) -> str : \"\"\" Retrieve the external knowledge and construct the prompt. :param request: The request. :param offline_top_k: The number of offline documents to retrieve. :param online_top_k: The number of online documents to retrieve. :return: The prompt message for the external_knowledge. \"\"\" retrieved_docs = \"\" # Retrieve offline documents and construct the prompt if self . offline_doc_retriever : offline_docs = self . offline_doc_retriever . retrieve ( \"How to {query} for {app} \" . format ( query = request , app = self . _process_name ), offline_top_k , filter = None , ) offline_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Help Documents\" , \"Document\" , [ doc . metadata [ \"text\" ] for doc in offline_docs ], ) retrieved_docs += offline_docs_prompt # Retrieve online documents and construct the prompt if self . online_doc_retriever : online_search_docs = self . online_doc_retriever . retrieve ( request , online_top_k , filter = None ) online_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Online Search Results\" , \"Search Result\" , [ doc . page_content for doc in online_search_docs ], ) retrieved_docs += online_docs_prompt return retrieved_docs get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_root_name ( str ) \u2013 The root name of the app. Returns: AppAgentPrompter \u2013 The prompter instance. Source code in agents/agent/app_agent.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_root_name : str , ) -> AppAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return AppAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) message_constructor ( dynamic_examples , dynamic_tips , dynamic_knowledge , image_list , control_info , prev_subtask , plan , request , subtask , host_message , include_last_screenshot ) Construct the prompt message for the AppAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the external knowledge base. image_list ( List ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. plan ( List [ str ] ) \u2013 The plan list. request ( str ) \u2013 The overall user request. subtask ( str ) \u2013 The subtask for the current AppAgent to process. host_message ( List [ str ] ) \u2013 The message from the HostAgent. include_last_screenshot ( bool ) \u2013 The flag indicating whether to include the last screenshot. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The prompt message. Source code in agents/agent/app_agent.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List , control_info : str , prev_subtask : List [ Dict [ str , str ]], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], include_last_screenshot : bool , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the prompt message for the AppAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base. :param image_list: The list of screenshot images. :param control_info: The control information. :param plan: The plan list. :param request: The overall user request. :param subtask: The subtask for the current AppAgent to process. :param host_message: The message from the HostAgent. :param include_last_screenshot: The flag indicating whether to include the last screenshot. :return: The prompt message. \"\"\" appagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) appagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , include_last_screenshot = include_last_screenshot , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () appagent_prompt_user_message = ( blackboard_prompt + appagent_prompt_user_message ) appagent_prompt_message = self . prompter . prompt_construction ( appagent_prompt_system_message , appagent_prompt_user_message ) return appagent_prompt_message print_response ( response_dict ) Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/app_agent.py 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" control_text = response_dict . get ( \"ControlText\" ) control_label = response_dict . get ( \"ControlLabel\" ) if not control_text and not control_label : control_text = \"[No control selected.]\" control_label = \"[No control label selected.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) plan = response_dict . get ( \"Plan\" ) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) function_call = response_dict . get ( \"Function\" ) args = utils . revise_line_breaks ( response_dict . get ( \"Args\" )) # Generate the function call string action = self . Puppeteer . get_command_string ( function_call , args ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) utils . print_with_color ( \"Selected item\ud83d\udd79\ufe0f: {control_text} , Label: {label} \" . format ( control_text = control_text , label = control_label ), \"yellow\" , ) utils . print_with_color ( \"Action applied\u2692\ufe0f: {action} \" . format ( action = action ), \"blue\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Next Plan\ud83d\udcda: {plan} \" . format ( plan = \" \\n \" . join ( plan )), \"cyan\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" ) screenshot_saving = response_dict . get ( \"SaveScreenshot\" , {}) if screenshot_saving . get ( \"save\" , False ): utils . print_with_color ( \"Notice: The current screenshot\ud83d\udcf8 is saved to the blackboard.\" , \"yellow\" , ) utils . print_with_color ( \"Saving reason: {reason} \" . format ( reason = screenshot_saving . get ( \"reason\" ) ), \"yellow\" , ) process ( context ) Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/app_agent.py 290 291 292 293 294 295 296 297 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = AppAgentProcessor ( agent = self , context = context ) self . processor . process () self . status = self . processor . status process_comfirmation () Process the user confirmation. Returns: bool \u2013 The decision. Source code in agents/agent/app_agent.py 306 307 308 309 310 311 312 313 314 315 316 317 318 319 def process_comfirmation ( self ) -> bool : \"\"\" Process the user confirmation. :return: The decision. \"\"\" action = self . processor . action control_text = self . processor . control_text decision = interactor . sensitive_step_asker ( action , control_text ) if not decision : utils . print_with_color ( \"The user has canceled the action.\" , \"red\" ) return decision rag_demonstration_retrieve ( request , demonstration_top_k ) Retrieving demonstration examples for the user request. Parameters: request ( str ) \u2013 The user request. demonstration_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 def rag_demonstration_retrieve ( self , request : str , demonstration_top_k : int ) -> str : \"\"\" Retrieving demonstration examples for the user request. :param request: The user request. :param demonstration_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve demonstration examples. demonstration_docs = self . human_demonstration_retriever . retrieve ( request , demonstration_top_k ) if demonstration_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in demonstration_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in demonstration_docs ] else : examples = [] tips = [] return examples , tips rag_experience_retrieve ( request , experience_top_k ) Retrieving experience examples for the user request. Parameters: request ( str ) \u2013 The user request. experience_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 def rag_experience_retrieve ( self , request : str , experience_top_k : int ) -> str : \"\"\" Retrieving experience examples for the user request. :param request: The user request. :param experience_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve experience examples. Only retrieve the examples that are related to the current application. experience_docs = self . experience_retriever . retrieve ( request , experience_top_k , filter = lambda x : self . _app_root_name . lower () in [ app . lower () for app in x [ \"app_list\" ]], ) if experience_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in experience_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in experience_docs ] else : examples = [] tips = [] return examples , tips","title":"AppAgent"},{"location":"agents/app_agent/#appagent","text":"An AppAgent is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The AppAgent is created by the HostAgent to fulfill a sub-task within a Round . The AppAgent is responsible for executing the necessary actions within the application to fulfill the user's request. The AppAgent has the following features: ReAct with the Application - The AppAgent recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request. Comprehension Enhancement - The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application \"expert\". Versatile Skill Set - The AppAgent is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and \"Copilot\". Tip You can find the how to enhance the AppAgent with external knowledge bases and demonstration libraries in the Reinforcing AppAgent documentation. We show the framework of the AppAgent in the following diagram:","title":"AppAgent \ud83d\udc7e"},{"location":"agents/app_agent/#appagent-input","text":"To interact with the application, the AppAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Sub-Task The sub-task description to be executed by the AppAgent , assigned by the HostAgent . String Current Application The name of the application to be interacted with. String Control Information Index, name and control type of available controls in the application. List of Dictionaries Application Screenshots Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional). List of Strings Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following steps. List of Strings HostAgent Message The message from the HostAgent for the completion of the sub-task. String Retrived Information The retrieved information from external knowledge bases or demonstration libraries. String Blackboard The shared memory space for storing and sharing information among the agents. Dictionary Below is an example of the annotated application screenshot with labeled controls. This follow the Set-of-Mark paradigm.","title":"AppAgent Input"},{"location":"agents/app_agent/#appagent-output","text":"With the inputs provided, the AppAgent generates the following outputs: Output Description Type Observation The observation of the current application screenshots. String Thought The logical reasoning process of the AppAgent . String ControlLabel The index of the selected control to interact with. String ControlText The name of the selected control to interact with. String Function The function to be executed on the selected control. String Args The arguments required for the function execution. List of Strings Status The status of the agent, mapped to the AgentState . String Plan The plan for the following steps after the current action. List of Strings Comment Additional comments or information provided to the user. String SaveScreenshot The flag to save the screenshot of the application to the blackboard for future reference. Boolean Below is an example of the AppAgent output: { \"Observation\": \"Application screenshot\", \"Thought\": \"Logical reasoning process\", \"ControlLabel\": \"Control index\", \"ControlText\": \"Control name\", \"Function\": \"Function name\", \"Args\": [\"arg1\", \"arg2\"], \"Status\": \"AgentState\", \"Plan\": [\"Step 1\", \"Step 2\"], \"Comment\": \"Additional comments\", \"SaveScreenshot\": true } Info The AppAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.","title":"AppAgent Output"},{"location":"agents/app_agent/#appagent-state","text":"The AppAgent state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the ufo/agents/states/app_agent_states.py module. The states include: State Description CONTINUE The AppAgent continues executing the current action. FINISH The AppAgent has completed the current sub-task. ERROR The AppAgent encountered an error during execution. FAIL The AppAgent believes the current sub-task is unachievable. CONFIRM The AppAgent is confirming the user's input or action. SCREENSHOT The AppAgent believes the current screenshot is not clear in annotating the control and requests a new screenshot. The state machine diagram for the AppAgent is shown below:","title":"AppAgent State"},{"location":"agents/app_agent/#knowledge-enhancement","text":"The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The AppAgent leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance.","title":"Knowledge Enhancement"},{"location":"agents/app_agent/#learning-from-help-documents","text":"User can provide help documents to the AppAgent to enhance its comprehension of the application and improve its performance in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for how to provide help documents to the AppAgent . In the AppAgent , it calls the build_offline_docs_retriever to build a help document retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent .","title":"Learning from Help Documents"},{"location":"agents/app_agent/#learning-from-bing-search","text":"Since help documents may not cover all the information or the information may be outdated, the AppAgent can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the config.yaml file. Tip Please find details configuration in the documentation . Tip You may also refer to the here for the implementation of Bing search in the AppAgent . In the AppAgent , it calls the build_online_search_retriever to build a Bing search retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent .","title":"Learning from Bing Search"},{"location":"agents/app_agent/#learning-from-self-demonstrations","text":"You may save successful action trajectories in the AppAgent to learn from self-demonstrations and improve its performance. After the completion of a session , the AppAgent will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of self-demonstrations in the AppAgent . In the AppAgent , it calls the build_experience_retriever to build a self-demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent .","title":"Learning from Self-Demonstrations"},{"location":"agents/app_agent/#learning-from-human-demonstrations","text":"In addition to self-demonstrations, you can also provide human demonstrations to the AppAgent to enhance its performance by using the Step Recorder tool built in the Windows OS. The AppAgent will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the config.yaml file. Tip You can find details of the configuration in the documentation . Tip You may also refer to the here for the implementation of human demonstrations in the AppAgent . In the AppAgent , it calls the build_human_demonstration_retriever to build a human demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent .","title":"Learning from Human Demonstrations"},{"location":"agents/app_agent/#skill-set-for-automation","text":"The AppAgent is equipped with a versatile skill set to support comprehensive automation within the application by calling the create_puppeteer_interface method. The skills include: Skill Description UI Automation Mimicking user interactions with the application UI controls using the UI Automation and Win32 API. Native API Accessing the application's native API to execute specific functions and actions. In-App Agent Leveraging the in-app agent to interact with the application's internal functions and features. By utilizing these skills, the AppAgent can efficiently interact with the application and fulfill the user's request. You can find more details in the Automator documentation and the code in the ufo/automator module.","title":"Skill Set for Automation"},{"location":"agents/app_agent/#reference","text":"Bases: BasicAgent The AppAgent class that manages the interaction with the application. Initialize the AppAgent. :name: The name of the agent. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. skip_prompter ( bool , default: False ) \u2013 The flag indicating whether to skip the prompter initialization. Source code in agents/agent/app_agent.py 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , skip_prompter : bool = False , ) -> None : \"\"\" Initialize the AppAgent. :name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param skip_prompter: The flag indicating whether to skip the prompter initialization. \"\"\" super () . __init__ ( name = name ) if not skip_prompter : self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name ) self . _process_name = process_name self . _app_root_name = app_root_name self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . Puppeteer = self . create_puppeteer_interface () self . set_state ( ContinueAppAgentState ())","title":"Reference"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.status_manager","text":"Get the status manager.","title":"status_manager"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_experience_retriever","text":"Build the experience retriever. Parameters: db_path ( str ) \u2013 The path to the experience database. Returns: None \u2013 The experience retriever. Source code in agents/agent/app_agent.py 346 347 348 349 350 351 352 353 354 def build_experience_retriever ( self , db_path : str ) -> None : \"\"\" Build the experience retriever. :param db_path: The path to the experience database. :return: The experience retriever. \"\"\" self . experience_retriever = self . retriever_factory . create_retriever ( \"experience\" , db_path )","title":"build_experience_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_human_demonstration_retriever","text":"Build the human demonstration retriever. Parameters: db_path ( str ) \u2013 The path to the human demonstration database. Returns: None \u2013 The human demonstration retriever. Source code in agents/agent/app_agent.py 356 357 358 359 360 361 362 363 364 def build_human_demonstration_retriever ( self , db_path : str ) -> None : \"\"\" Build the human demonstration retriever. :param db_path: The path to the human demonstration database. :return: The human demonstration retriever. \"\"\" self . human_demonstration_retriever = self . retriever_factory . create_retriever ( \"demonstration\" , db_path )","title":"build_human_demonstration_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_offline_docs_retriever","text":"Build the offline docs retriever. Source code in agents/agent/app_agent.py 328 329 330 331 332 333 334 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" self . offline_doc_retriever = self . retriever_factory . create_retriever ( \"offline\" , self . _app_root_name )","title":"build_offline_docs_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.build_online_search_retriever","text":"Build the online search retriever. Parameters: request ( str ) \u2013 The request for online Bing search. top_k ( int ) \u2013 The number of documents to retrieve. Source code in agents/agent/app_agent.py 336 337 338 339 340 341 342 343 344 def build_online_search_retriever ( self , request : str , top_k : int ) -> None : \"\"\" Build the online search retriever. :param request: The request for online Bing search. :param top_k: The number of documents to retrieve. \"\"\" self . online_doc_retriever = self . retriever_factory . create_retriever ( \"online\" , request , top_k )","title":"build_online_search_retriever"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.context_provision","text":"Provision the context for the app agent. Parameters: request ( str , default: '' ) \u2013 The request sent to the Bing search retriever. Source code in agents/agent/app_agent.py 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 def context_provision ( self , request : str = \"\" ) -> None : \"\"\" Provision the context for the app agent. :param request: The request sent to the Bing search retriever. \"\"\" # Load the offline document indexer for the app agent if available. if configs [ \"RAG_OFFLINE_DOCS\" ]: utils . print_with_color ( \"Loading offline help document indexer for {app} ...\" . format ( app = self . _process_name ), \"magenta\" , ) self . build_offline_docs_retriever () # Load the online search indexer for the app agent if available. if configs [ \"RAG_ONLINE_SEARCH\" ] and request : utils . print_with_color ( \"Creating a Bing search indexer...\" , \"magenta\" ) self . build_online_search_retriever ( request , configs [ \"RAG_ONLINE_SEARCH_TOPK\" ] ) # Load the experience indexer for the app agent if available. if configs [ \"RAG_EXPERIENCE\" ]: utils . print_with_color ( \"Creating an experience indexer...\" , \"magenta\" ) experience_path = configs [ \"EXPERIENCE_SAVED_PATH\" ] db_path = os . path . join ( experience_path , \"experience_db\" ) self . build_experience_retriever ( db_path ) # Load the demonstration indexer for the app agent if available. if configs [ \"RAG_DEMONSTRATION\" ]: utils . print_with_color ( \"Creating an demonstration indexer...\" , \"magenta\" ) demonstration_path = configs [ \"DEMONSTRATION_SAVED_PATH\" ] db_path = os . path . join ( demonstration_path , \"demonstration_db\" ) self . build_human_demonstration_retriever ( db_path )","title":"context_provision"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.create_puppeteer_interface","text":"Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/app_agent.py 299 300 301 302 303 304 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( self . _process_name , self . _app_root_name )","title":"create_puppeteer_interface"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.external_knowledge_prompt_helper","text":"Retrieve the external knowledge and construct the prompt. Parameters: request ( str ) \u2013 The request. offline_top_k ( int ) \u2013 The number of offline documents to retrieve. online_top_k ( int ) \u2013 The number of online documents to retrieve. Returns: str \u2013 The prompt message for the external_knowledge. Source code in agents/agent/app_agent.py 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 def external_knowledge_prompt_helper ( self , request : str , offline_top_k : int , online_top_k : int ) -> str : \"\"\" Retrieve the external knowledge and construct the prompt. :param request: The request. :param offline_top_k: The number of offline documents to retrieve. :param online_top_k: The number of online documents to retrieve. :return: The prompt message for the external_knowledge. \"\"\" retrieved_docs = \"\" # Retrieve offline documents and construct the prompt if self . offline_doc_retriever : offline_docs = self . offline_doc_retriever . retrieve ( \"How to {query} for {app} \" . format ( query = request , app = self . _process_name ), offline_top_k , filter = None , ) offline_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Help Documents\" , \"Document\" , [ doc . metadata [ \"text\" ] for doc in offline_docs ], ) retrieved_docs += offline_docs_prompt # Retrieve online documents and construct the prompt if self . online_doc_retriever : online_search_docs = self . online_doc_retriever . retrieve ( request , online_top_k , filter = None ) online_docs_prompt = self . prompter . retrived_documents_prompt_helper ( \"Online Search Results\" , \"Search Result\" , [ doc . page_content for doc in online_search_docs ], ) retrieved_docs += online_docs_prompt return retrieved_docs","title":"external_knowledge_prompt_helper"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.get_prompter","text":"Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_root_name ( str ) \u2013 The root name of the app. Returns: AppAgentPrompter \u2013 The prompter instance. Source code in agents/agent/app_agent.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_root_name : str , ) -> AppAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return AppAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name )","title":"get_prompter"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.message_constructor","text":"Construct the prompt message for the AppAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the external knowledge base. image_list ( List ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. plan ( List [ str ] ) \u2013 The plan list. request ( str ) \u2013 The overall user request. subtask ( str ) \u2013 The subtask for the current AppAgent to process. host_message ( List [ str ] ) \u2013 The message from the HostAgent. include_last_screenshot ( bool ) \u2013 The flag indicating whether to include the last screenshot. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The prompt message. Source code in agents/agent/app_agent.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List , control_info : str , prev_subtask : List [ Dict [ str , str ]], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], include_last_screenshot : bool , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the prompt message for the AppAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base. :param image_list: The list of screenshot images. :param control_info: The control information. :param plan: The plan list. :param request: The overall user request. :param subtask: The subtask for the current AppAgent to process. :param host_message: The message from the HostAgent. :param include_last_screenshot: The flag indicating whether to include the last screenshot. :return: The prompt message. \"\"\" appagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) appagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , include_last_screenshot = include_last_screenshot , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () appagent_prompt_user_message = ( blackboard_prompt + appagent_prompt_user_message ) appagent_prompt_message = self . prompter . prompt_construction ( appagent_prompt_system_message , appagent_prompt_user_message ) return appagent_prompt_message","title":"message_constructor"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.print_response","text":"Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/app_agent.py 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" control_text = response_dict . get ( \"ControlText\" ) control_label = response_dict . get ( \"ControlLabel\" ) if not control_text and not control_label : control_text = \"[No control selected.]\" control_label = \"[No control label selected.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) plan = response_dict . get ( \"Plan\" ) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) function_call = response_dict . get ( \"Function\" ) args = utils . revise_line_breaks ( response_dict . get ( \"Args\" )) # Generate the function call string action = self . Puppeteer . get_command_string ( function_call , args ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) utils . print_with_color ( \"Selected item\ud83d\udd79\ufe0f: {control_text} , Label: {label} \" . format ( control_text = control_text , label = control_label ), \"yellow\" , ) utils . print_with_color ( \"Action applied\u2692\ufe0f: {action} \" . format ( action = action ), \"blue\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Next Plan\ud83d\udcda: {plan} \" . format ( plan = \" \\n \" . join ( plan )), \"cyan\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" ) screenshot_saving = response_dict . get ( \"SaveScreenshot\" , {}) if screenshot_saving . get ( \"save\" , False ): utils . print_with_color ( \"Notice: The current screenshot\ud83d\udcf8 is saved to the blackboard.\" , \"yellow\" , ) utils . print_with_color ( \"Saving reason: {reason} \" . format ( reason = screenshot_saving . get ( \"reason\" ) ), \"yellow\" , )","title":"print_response"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.process","text":"Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/app_agent.py 290 291 292 293 294 295 296 297 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = AppAgentProcessor ( agent = self , context = context ) self . processor . process () self . status = self . processor . status","title":"process"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.process_comfirmation","text":"Process the user confirmation. Returns: bool \u2013 The decision. Source code in agents/agent/app_agent.py 306 307 308 309 310 311 312 313 314 315 316 317 318 319 def process_comfirmation ( self ) -> bool : \"\"\" Process the user confirmation. :return: The decision. \"\"\" action = self . processor . action control_text = self . processor . control_text decision = interactor . sensitive_step_asker ( action , control_text ) if not decision : utils . print_with_color ( \"The user has canceled the action.\" , \"red\" ) return decision","title":"process_comfirmation"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.rag_demonstration_retrieve","text":"Retrieving demonstration examples for the user request. Parameters: request ( str ) \u2013 The user request. demonstration_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 def rag_demonstration_retrieve ( self , request : str , demonstration_top_k : int ) -> str : \"\"\" Retrieving demonstration examples for the user request. :param request: The user request. :param demonstration_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve demonstration examples. demonstration_docs = self . human_demonstration_retriever . retrieve ( request , demonstration_top_k ) if demonstration_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in demonstration_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in demonstration_docs ] else : examples = [] tips = [] return examples , tips","title":"rag_demonstration_retrieve"},{"location":"agents/app_agent/#agents.agent.app_agent.AppAgent.rag_experience_retrieve","text":"Retrieving experience examples for the user request. Parameters: request ( str ) \u2013 The user request. experience_top_k ( int ) \u2013 The number of documents to retrieve. Returns: str \u2013 The retrieved examples and tips string. Source code in agents/agent/app_agent.py 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 def rag_experience_retrieve ( self , request : str , experience_top_k : int ) -> str : \"\"\" Retrieving experience examples for the user request. :param request: The user request. :param experience_top_k: The number of documents to retrieve. :return: The retrieved examples and tips string. \"\"\" # Retrieve experience examples. Only retrieve the examples that are related to the current application. experience_docs = self . experience_retriever . retrieve ( request , experience_top_k , filter = lambda x : self . _app_root_name . lower () in [ app . lower () for app in x [ \"app_list\" ]], ) if experience_docs : examples = [ doc . metadata . get ( \"example\" , {}) for doc in experience_docs ] tips = [ doc . metadata . get ( \"Tips\" , \"\" ) for doc in experience_docs ] else : examples = [] tips = [] return examples , tips","title":"rag_experience_retrieve"},{"location":"agents/evaluation_agent/","text":"EvaluationAgent \ud83e\uddd0 The objective of the EvaluationAgent is to evaluate whether a Session or Round has been successfully completed. The EvaluationAgent assesses the performance of the HostAgent and AppAgent in fulfilling the request. You can configure whether to enable the EvaluationAgent in the config_dev.yaml file and the detailed documentation can be found here . Note The EvaluationAgent is fully LLM-driven and conducts evaluations based on the action trajectories and screenshots. It may not by 100% accurate since LLM may make mistakes. Configuration To enable the EvaluationAgent , you can configure the following parameters in the config_dev.yaml file to evaluate the task completion status at different levels: Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True Evaluation Inputs The EvaluationAgent takes the following inputs for evaluation: Input Description Type User Request The user's request to be evaluated. String APIs Description The description of the APIs used in the execution. List of Strings Action Trajectories The action trajectories executed by the HostAgent and AppAgent . List of Strings Screenshots The screenshots captured during the execution. List of Images For more details on how to construct the inputs, please refer to the EvaluationAgentPrompter class in ufo/prompter/eva_prompter.py . Tip You can configure whether to use all screenshots or only the first and last screenshot for evaluation in the EVA_ALL_SCREENSHOTS of the config_dev.yaml file. Evaluation Outputs The EvaluationAgent generates the following outputs after evaluation: Output Description Type reason The detailed reason for your judgment, by observing the screenshot differences and the . String sub_scores The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. List of Dictionaries complete The completion status of the evaluation, can be yes , no , or unsure . String Below is an example of the evaluation output: { \"reason\": \"The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams. The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open. The agent then focused on the chat window, input the message 'hello', and clicked the Send button. The final screenshot confirms that the message 'hello' was sent to Zac.\", \"sub_scores\": { \"correct application focus\": \"yes\", \"correct message input\": \"yes\", \"message sent successfully\": \"yes\" }, \"complete\": \"yes\"} Info The log of the evaluation results will be saved in the logs/{task_name}/evaluation.log file. The EvaluationAgent employs the CoT mechanism to first decompose the evaluation into multiple sub-goals and then evaluate each sub-goal separately. The sub-scores are then aggregated to determine the overall completion status of the evaluation. Reference Bases: BasicAgent The agent for evaluation. Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. Source code in agents/agent/evaluation_agent.py 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def __init__ ( self , name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. \"\"\" super () . __init__ ( name = name ) self . _app_root_name = app_root_name self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name , ) status_manager : EvaluatonAgentStatus property Get the status manager. evaluate ( request , log_path , eva_all_screenshots = True ) Evaluate the task completion. Parameters: log_path ( str ) \u2013 The path to the log file. Returns: Tuple [ Dict [ str , str ], float ] \u2013 The evaluation result and the cost of LLM. Source code in agents/agent/evaluation_agent.py 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 def evaluate ( self , request : str , log_path : str , eva_all_screenshots : bool = True ) -> Tuple [ Dict [ str , str ], float ]: \"\"\" Evaluate the task completion. :param log_path: The path to the log file. :return: The evaluation result and the cost of LLM. \"\"\" message = self . message_constructor ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) result , cost = self . get_response ( message = message , namescope = \"app\" , use_backup_engine = True ) result = json_parser ( result ) return result , cost get_prompter ( is_visual , prompt_template , example_prompt_template , api_prompt_template , root_name = None ) Get the prompter for the agent. Source code in agents/agent/evaluation_agent.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def get_prompter ( self , is_visual , prompt_template : str , example_prompt_template : str , api_prompt_template : str , root_name : Optional [ str ] = None , ) -> EvaluationAgentPrompter : \"\"\" Get the prompter for the agent. \"\"\" return EvaluationAgentPrompter ( is_visual = is_visual , prompt_template = prompt_template , example_prompt_template = example_prompt_template , api_prompt_template = api_prompt_template , root_name = root_name , ) message_constructor ( log_path , request , eva_all_screenshots = True ) Construct the message. Parameters: log_path ( str ) \u2013 The path to the log file. request ( str ) \u2013 The request. eva_all_screenshots ( bool , default: True ) \u2013 The flag indicating whether to evaluate all screenshots. Returns: Dict [ str , Any ] \u2013 The message. Source code in agents/agent/evaluation_agent.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 def message_constructor ( self , log_path : str , request : str , eva_all_screenshots : bool = True ) -> Dict [ str , Any ]: \"\"\" Construct the message. :param log_path: The path to the log file. :param request: The request. :param eva_all_screenshots: The flag indicating whether to evaluate all screenshots. :return: The message. \"\"\" evaagent_prompt_system_message = self . prompter . system_prompt_construction () evaagent_prompt_user_message = self . prompter . user_content_construction ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) evaagent_prompt_message = self . prompter . prompt_construction ( evaagent_prompt_system_message , evaagent_prompt_user_message ) return evaagent_prompt_message print_response ( response_dict ) Print the response of the evaluation. Parameters: response_dict ( Dict [ str , Any ] ) \u2013 The response dictionary. Source code in agents/agent/evaluation_agent.py 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 def print_response ( self , response_dict : Dict [ str , Any ]) -> None : \"\"\" Print the response of the evaluation. :param response_dict: The response dictionary. \"\"\" emoji_map = { \"yes\" : \"\u2705\" , \"no\" : \"\u274c\" , \"maybe\" : \"\u2753\" , } complete = emoji_map . get ( response_dict . get ( \"complete\" ), response_dict . get ( \"complete\" ) ) sub_scores = response_dict . get ( \"sub_scores\" , {}) reason = response_dict . get ( \"reason\" , \"\" ) print_with_color ( f \"Evaluation result\ud83e\uddd0:\" , \"magenta\" ) print_with_color ( f \"[Sub-scores\ud83d\udcca:]\" , \"green\" ) for score , evaluation in sub_scores . items (): print_with_color ( f \" { score } : { emoji_map . get ( evaluation , evaluation ) } \" , \"green\" ) print_with_color ( \"[Task is complete\ud83d\udcaf:] {complete} \" . format ( complete = complete ), \"cyan\" ) print_with_color ( f \"[Reason\ud83e\udd14:] { reason } \" . format ( reason = reason ), \"blue\" ) process_comfirmation () Comfirmation, currently do nothing. Source code in agents/agent/evaluation_agent.py 124 125 126 127 128 def process_comfirmation ( self ) -> None : \"\"\" Comfirmation, currently do nothing. \"\"\" pass","title":"EvaluationAgent"},{"location":"agents/evaluation_agent/#evaluationagent","text":"The objective of the EvaluationAgent is to evaluate whether a Session or Round has been successfully completed. The EvaluationAgent assesses the performance of the HostAgent and AppAgent in fulfilling the request. You can configure whether to enable the EvaluationAgent in the config_dev.yaml file and the detailed documentation can be found here . Note The EvaluationAgent is fully LLM-driven and conducts evaluations based on the action trajectories and screenshots. It may not by 100% accurate since LLM may make mistakes.","title":"EvaluationAgent \ud83e\uddd0"},{"location":"agents/evaluation_agent/#configuration","text":"To enable the EvaluationAgent , you can configure the following parameters in the config_dev.yaml file to evaluate the task completion status at different levels: Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True","title":"Configuration"},{"location":"agents/evaluation_agent/#evaluation-inputs","text":"The EvaluationAgent takes the following inputs for evaluation: Input Description Type User Request The user's request to be evaluated. String APIs Description The description of the APIs used in the execution. List of Strings Action Trajectories The action trajectories executed by the HostAgent and AppAgent . List of Strings Screenshots The screenshots captured during the execution. List of Images For more details on how to construct the inputs, please refer to the EvaluationAgentPrompter class in ufo/prompter/eva_prompter.py . Tip You can configure whether to use all screenshots or only the first and last screenshot for evaluation in the EVA_ALL_SCREENSHOTS of the config_dev.yaml file.","title":"Evaluation Inputs"},{"location":"agents/evaluation_agent/#evaluation-outputs","text":"The EvaluationAgent generates the following outputs after evaluation: Output Description Type reason The detailed reason for your judgment, by observing the screenshot differences and the . String sub_scores The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. List of Dictionaries complete The completion status of the evaluation, can be yes , no , or unsure . String Below is an example of the evaluation output: { \"reason\": \"The agent successfully completed the task of sending 'hello' to Zac on Microsoft Teams. The initial screenshot shows the Microsoft Teams application with the chat window of Chaoyun Zhang open. The agent then focused on the chat window, input the message 'hello', and clicked the Send button. The final screenshot confirms that the message 'hello' was sent to Zac.\", \"sub_scores\": { \"correct application focus\": \"yes\", \"correct message input\": \"yes\", \"message sent successfully\": \"yes\" }, \"complete\": \"yes\"} Info The log of the evaluation results will be saved in the logs/{task_name}/evaluation.log file. The EvaluationAgent employs the CoT mechanism to first decompose the evaluation into multiple sub-goals and then evaluate each sub-goal separately. The sub-scores are then aggregated to determine the overall completion status of the evaluation.","title":"Evaluation Outputs"},{"location":"agents/evaluation_agent/#reference","text":"Bases: BasicAgent The agent for evaluation. Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. Source code in agents/agent/evaluation_agent.py 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def __init__ ( self , name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the FollowAgent. :agent_type: The type of the agent. :is_visual: The flag indicating whether the agent is visual or not. \"\"\" super () . __init__ ( name = name ) self . _app_root_name = app_root_name self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_root_name , )","title":"Reference"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.status_manager","text":"Get the status manager.","title":"status_manager"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.evaluate","text":"Evaluate the task completion. Parameters: log_path ( str ) \u2013 The path to the log file. Returns: Tuple [ Dict [ str , str ], float ] \u2013 The evaluation result and the cost of LLM. Source code in agents/agent/evaluation_agent.py 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 def evaluate ( self , request : str , log_path : str , eva_all_screenshots : bool = True ) -> Tuple [ Dict [ str , str ], float ]: \"\"\" Evaluate the task completion. :param log_path: The path to the log file. :return: The evaluation result and the cost of LLM. \"\"\" message = self . message_constructor ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) result , cost = self . get_response ( message = message , namescope = \"app\" , use_backup_engine = True ) result = json_parser ( result ) return result , cost","title":"evaluate"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.get_prompter","text":"Get the prompter for the agent. Source code in agents/agent/evaluation_agent.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def get_prompter ( self , is_visual , prompt_template : str , example_prompt_template : str , api_prompt_template : str , root_name : Optional [ str ] = None , ) -> EvaluationAgentPrompter : \"\"\" Get the prompter for the agent. \"\"\" return EvaluationAgentPrompter ( is_visual = is_visual , prompt_template = prompt_template , example_prompt_template = example_prompt_template , api_prompt_template = api_prompt_template , root_name = root_name , )","title":"get_prompter"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.message_constructor","text":"Construct the message. Parameters: log_path ( str ) \u2013 The path to the log file. request ( str ) \u2013 The request. eva_all_screenshots ( bool , default: True ) \u2013 The flag indicating whether to evaluate all screenshots. Returns: Dict [ str , Any ] \u2013 The message. Source code in agents/agent/evaluation_agent.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 def message_constructor ( self , log_path : str , request : str , eva_all_screenshots : bool = True ) -> Dict [ str , Any ]: \"\"\" Construct the message. :param log_path: The path to the log file. :param request: The request. :param eva_all_screenshots: The flag indicating whether to evaluate all screenshots. :return: The message. \"\"\" evaagent_prompt_system_message = self . prompter . system_prompt_construction () evaagent_prompt_user_message = self . prompter . user_content_construction ( log_path = log_path , request = request , eva_all_screenshots = eva_all_screenshots ) evaagent_prompt_message = self . prompter . prompt_construction ( evaagent_prompt_system_message , evaagent_prompt_user_message ) return evaagent_prompt_message","title":"message_constructor"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.print_response","text":"Print the response of the evaluation. Parameters: response_dict ( Dict [ str , Any ] ) \u2013 The response dictionary. Source code in agents/agent/evaluation_agent.py 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 def print_response ( self , response_dict : Dict [ str , Any ]) -> None : \"\"\" Print the response of the evaluation. :param response_dict: The response dictionary. \"\"\" emoji_map = { \"yes\" : \"\u2705\" , \"no\" : \"\u274c\" , \"maybe\" : \"\u2753\" , } complete = emoji_map . get ( response_dict . get ( \"complete\" ), response_dict . get ( \"complete\" ) ) sub_scores = response_dict . get ( \"sub_scores\" , {}) reason = response_dict . get ( \"reason\" , \"\" ) print_with_color ( f \"Evaluation result\ud83e\uddd0:\" , \"magenta\" ) print_with_color ( f \"[Sub-scores\ud83d\udcca:]\" , \"green\" ) for score , evaluation in sub_scores . items (): print_with_color ( f \" { score } : { emoji_map . get ( evaluation , evaluation ) } \" , \"green\" ) print_with_color ( \"[Task is complete\ud83d\udcaf:] {complete} \" . format ( complete = complete ), \"cyan\" ) print_with_color ( f \"[Reason\ud83e\udd14:] { reason } \" . format ( reason = reason ), \"blue\" )","title":"print_response"},{"location":"agents/evaluation_agent/#agents.agent.evaluation_agent.EvaluationAgent.process_comfirmation","text":"Comfirmation, currently do nothing. Source code in agents/agent/evaluation_agent.py 124 125 126 127 128 def process_comfirmation ( self ) -> None : \"\"\" Comfirmation, currently do nothing. \"\"\" pass","title":"process_comfirmation"},{"location":"agents/follower_agent/","text":"Follower Agent \ud83d\udeb6\ud83c\udffd\u200d\u2642\ufe0f The FollowerAgent is inherited from the AppAgent and is responsible for following the user's instructions to perform specific tasks within the application. The FollowerAgent is designed to execute a series of actions based on the user's guidance. It is particularly useful for software testing, when clear instructions are provided to validate the application's behavior. Different from the AppAgent The FollowerAgent shares most of the functionalities with the AppAgent , but it is designed to follow the step-by-step instructions provided by the user, instead of does its own reasoning to determine the next action. Usage The FollowerAgent is available in follower mode. You can find more details in the documentation . It also uses differnt Session and Processor to handle the user's instructions. The step-wise instructions are provided by the user in the in a json file, which is then parsed by the FollowerAgent to execute the actions. An example of the json file is shown below: { \"task\": \"Type in a bold text of 'Test For Fun'\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.select the text of 'Test For Fun'\", \"3.click on the bold\" ], \"object\": \"draft.docx\" } Reference Bases: AppAgent The FollowerAgent class the manager of a FollowedAgent that follows the step-by-step instructions for action execution within an application. It is a subclass of the AppAgent, which completes the action execution within the application. Initialize the FollowAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. Source code in agents/agent/follower_agent.py 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , ): \"\"\" Initialize the FollowAgent. :param name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. \"\"\" super () . __init__ ( name = name , process_name = process_name , app_root_name = app_root_name , is_visual = is_visual , main_prompt = main_prompt , example_prompt = example_prompt , api_prompt = api_prompt , skip_prompter = True , ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , ) get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name = '' ) Get the prompter for the follower agent. Parameters: is_visual ( str ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. app_root_name ( str , default: '' ) \u2013 The root name of the app. Returns: FollowerAgentPrompter \u2013 The prompter instance. Source code in agents/agent/follower_agent.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 def get_prompter ( self , is_visual : str , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , app_root_name : str = \"\" , ) -> FollowerAgentPrompter : \"\"\" Get the prompter for the follower agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return FollowerAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , ) message_constructor ( dynamic_examples , dynamic_tips , dynamic_knowledge , image_list , control_info , prev_subtask , plan , request , subtask , host_message , current_state , state_diff , include_last_screenshot ) Construct the prompt message for the FollowAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the self-demonstration and human demonstration. image_list ( List [ str ] ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. prev_subtask ( List [ str ] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. subtask ( str ) \u2013 The subtask. host_message ( List [ str ] ) \u2013 The host message. current_state ( Dict [ str , str ] ) \u2013 The current state of the app. state_diff ( Dict [ str , str ] ) \u2013 The state difference between the current state and the previous state. include_last_screenshot ( bool ) \u2013 The flag indicating whether the last screenshot should be included. Returns: List [ Dict [ str , str ]] \u2013 The prompt message. Source code in agents/agent/follower_agent.py 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List [ str ], control_info : str , prev_subtask : List [ str ], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], current_state : Dict [ str , str ], state_diff : Dict [ str , str ], include_last_screenshot : bool , ) -> List [ Dict [ str , str ]]: \"\"\" Construct the prompt message for the FollowAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the self-demonstration and human demonstration. :param image_list: The list of screenshot images. :param control_info: The control information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :param subtask: The subtask. :param host_message: The host message. :param current_state: The current state of the app. :param state_diff: The state difference between the current state and the previous state. :param include_last_screenshot: The flag indicating whether the last screenshot should be included. :return: The prompt message. \"\"\" followagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) followagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , current_state = current_state , state_diff = state_diff , include_last_screenshot = include_last_screenshot , ) followagent_prompt_message = self . prompter . prompt_construction ( followagent_prompt_system_message , followagent_prompt_user_message ) return followagent_prompt_message","title":"FollowerAgent"},{"location":"agents/follower_agent/#follower-agent","text":"The FollowerAgent is inherited from the AppAgent and is responsible for following the user's instructions to perform specific tasks within the application. The FollowerAgent is designed to execute a series of actions based on the user's guidance. It is particularly useful for software testing, when clear instructions are provided to validate the application's behavior.","title":"Follower Agent \ud83d\udeb6\ud83c\udffd\u200d\u2642\ufe0f"},{"location":"agents/follower_agent/#different-from-the-appagent","text":"The FollowerAgent shares most of the functionalities with the AppAgent , but it is designed to follow the step-by-step instructions provided by the user, instead of does its own reasoning to determine the next action.","title":"Different from the AppAgent"},{"location":"agents/follower_agent/#usage","text":"The FollowerAgent is available in follower mode. You can find more details in the documentation . It also uses differnt Session and Processor to handle the user's instructions. The step-wise instructions are provided by the user in the in a json file, which is then parsed by the FollowerAgent to execute the actions. An example of the json file is shown below: { \"task\": \"Type in a bold text of 'Test For Fun'\", \"steps\": [ \"1.type in 'Test For Fun'\", \"2.select the text of 'Test For Fun'\", \"3.click on the bold\" ], \"object\": \"draft.docx\" }","title":"Usage"},{"location":"agents/follower_agent/#reference","text":"Bases: AppAgent The FollowerAgent class the manager of a FollowedAgent that follows the step-by-step instructions for action execution within an application. It is a subclass of the AppAgent, which completes the action execution within the application. Initialize the FollowAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. Source code in agents/agent/follower_agent.py 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def __init__ ( self , name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , ): \"\"\" Initialize the FollowAgent. :param name: The name of the agent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. \"\"\" super () . __init__ ( name = name , process_name = process_name , app_root_name = app_root_name , is_visual = is_visual , main_prompt = main_prompt , example_prompt = example_prompt , api_prompt = api_prompt , skip_prompter = True , ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , )","title":"Reference"},{"location":"agents/follower_agent/#agents.agent.follower_agent.FollowerAgent.get_prompter","text":"Get the prompter for the follower agent. Parameters: is_visual ( str ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. app_info_prompt ( str ) \u2013 The app information prompt file path. app_root_name ( str , default: '' ) \u2013 The root name of the app. Returns: FollowerAgentPrompter \u2013 The prompter instance. Source code in agents/agent/follower_agent.py 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 def get_prompter ( self , is_visual : str , main_prompt : str , example_prompt : str , api_prompt : str , app_info_prompt : str , app_root_name : str = \"\" , ) -> FollowerAgentPrompter : \"\"\" Get the prompter for the follower agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :param app_info_prompt: The app information prompt file path. :param app_root_name: The root name of the app. :return: The prompter instance. \"\"\" return FollowerAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt , app_info_prompt , app_root_name , )","title":"get_prompter"},{"location":"agents/follower_agent/#agents.agent.follower_agent.FollowerAgent.message_constructor","text":"Construct the prompt message for the FollowAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. dynamic_tips ( str ) \u2013 The dynamic tips retrieved from the self-demonstration and human demonstration. dynamic_knowledge ( str ) \u2013 The dynamic knowledge retrieved from the self-demonstration and human demonstration. image_list ( List [ str ] ) \u2013 The list of screenshot images. control_info ( str ) \u2013 The control information. prev_subtask ( List [ str ] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. subtask ( str ) \u2013 The subtask. host_message ( List [ str ] ) \u2013 The host message. current_state ( Dict [ str , str ] ) \u2013 The current state of the app. state_diff ( Dict [ str , str ] ) \u2013 The state difference between the current state and the previous state. include_last_screenshot ( bool ) \u2013 The flag indicating whether the last screenshot should be included. Returns: List [ Dict [ str , str ]] \u2013 The prompt message. Source code in agents/agent/follower_agent.py 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 def message_constructor ( self , dynamic_examples : str , dynamic_tips : str , dynamic_knowledge : str , image_list : List [ str ], control_info : str , prev_subtask : List [ str ], plan : List [ str ], request : str , subtask : str , host_message : List [ str ], current_state : Dict [ str , str ], state_diff : Dict [ str , str ], include_last_screenshot : bool , ) -> List [ Dict [ str , str ]]: \"\"\" Construct the prompt message for the FollowAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param dynamic_tips: The dynamic tips retrieved from the self-demonstration and human demonstration. :param dynamic_knowledge: The dynamic knowledge retrieved from the self-demonstration and human demonstration. :param image_list: The list of screenshot images. :param control_info: The control information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :param subtask: The subtask. :param host_message: The host message. :param current_state: The current state of the app. :param state_diff: The state difference between the current state and the previous state. :param include_last_screenshot: The flag indicating whether the last screenshot should be included. :return: The prompt message. \"\"\" followagent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples , dynamic_tips ) followagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = control_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , subtask = subtask , current_application = self . _process_name , host_message = host_message , retrieved_docs = dynamic_knowledge , current_state = current_state , state_diff = state_diff , include_last_screenshot = include_last_screenshot , ) followagent_prompt_message = self . prompter . prompt_construction ( followagent_prompt_system_message , followagent_prompt_user_message ) return followagent_prompt_message","title":"message_constructor"},{"location":"agents/host_agent/","text":"HostAgent \ud83e\udd16 The HostAgent assumes three primary responsibilities: User Engagement : The HostAgent engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary. AppAgent Management : The HostAgent manages the creation and registration of AppAgents to fulfill the user's request. It also orchestrates the interaction between the AppAgents and the application. Task Management : The HostAgent analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents . It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents to ensure the successful completion of the user's request. Bash Command Execution : The HostAgent can execute bash commands to open applications or execute system commands to support the user's request and the AppAgents ' execution. Communication : The HostAgent communicates with the AppAgents to exchange information. It also manages the Blackboard to store and share information among the agents, as shown below: The HostAgent activates its Processor to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent for execution. The HostAgent monitors the progress of the AppAgents and ensures the successful completion of the user's request. HostAgent Input The HostAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Application Information Information about the existing active applications. List of Strings Desktop Screenshots Screenshots of the desktop to provide context to the HostAgent . Image Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following sub-tasks. List of Strings Blackboard The shared memory space for storing and sharing information among the agents. Dictionary By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions. HostAgent Output With the inputs provided, the HostAgent generates the following outputs: Output Description Type Observation The observation of current desktop screenshots. String Thought The logical reasoning process of the HostAgent . String Current Sub-Task The current sub-task to be executed by the AppAgent . String Message The message to be sent to the AppAgent for the completion of the sub-task. String ControlLabel The index of the selected application to execute the sub-task. String ControlText The name of the selected application to execute the sub-task. String Plan The plan for the following sub-tasks after the current sub-task. List of Strings Status The status of the agent, mapped to the AgentState . String Comment Additional comments or information provided to the user. String Questions The questions to be asked to the user for additional information. List of Strings Bash The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. String Below is an example of the HostAgent output: { \"Observation\": \"Desktop screenshot\", \"Thought\": \"Logical reasoning process\", \"Current Sub-Task\": \"Sub-task description\", \"Message\": \"Message to AppAgent\", \"ControlLabel\": \"Application index\", \"ControlText\": \"Application name\", \"Plan\": [\"Sub-task 1\", \"Sub-task 2\"], \"Status\": \"AgentState\", \"Comment\": \"Additional comments\", \"Questions\": [\"Question 1\", \"Question 2\"], \"Bash\": \"Bash command\" } Info The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python. HostAgent State The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include: State Description CONTINUE The HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks. ASSIGN The HostAgent is assigning the sub-tasks to the AppAgents for execution. FINISH The overall task is completed, and the HostAgent is ready to return the results to the user. ERROR An error occurred during the processing of the user's request, and the HostAgent is unable to proceed. FAIL The HostAgent believes the task is unachievable and cannot proceed further. PENDING The HostAgent is waiting for additional information from the user to proceed. The state machine diagram for the HostAgent is shown below: The HostAgent transitions between these states based on the user's request, the application information, and the progress of the AppAgents in executing the sub-tasks. Task Decomposition Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure: Creating and Registering AppAgents When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent , by calling the create_subagent method: def create_subagent( self, agent_type: str, agent_name: str, process_name: str, app_root_name: str, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str, *args, **kwargs, ) -> BasicAgent: \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self.agent_factory.create_agent( agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs, ) self.appagent_dict[agent_name] = app_agent app_agent.host = self self._active_appagent = app_agent return app_agent The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress. Reference Bases: BasicAgent The HostAgent class the manager of AppAgents. Initialize the HostAgent. :name: The name of the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Source code in agents/agent/host_agent.py 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 def __init__ ( self , name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> None : \"\"\" Initialize the HostAgent. :name: The name of the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. \"\"\" super () . __init__ ( name = name ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . agent_factory = AgentFactory () self . appagent_dict = {} self . _active_appagent = None self . _blackboard = Blackboard () self . set_state ( ContinueHostAgentState ()) self . Puppeteer = self . create_puppeteer_interface () blackboard property Get the blackboard. status_manager : HostAgentStatus property Get the status manager. sub_agent_amount : int property Get the amount of sub agents. Returns: int \u2013 The amount of sub agents. create_app_agent ( application_window_name , application_root_name , request , mode ) Create the app agent for the host agent. Parameters: application_window_name ( str ) \u2013 The name of the application window. application_root_name ( str ) \u2013 The name of the application root. request ( str ) \u2013 The user request. mode ( str ) \u2013 The mode of the session. Returns: AppAgent \u2013 The app agent. Source code in agents/agent/host_agent.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 def create_app_agent ( self , application_window_name : str , application_root_name : str , request : str , mode : str , ) -> AppAgent : \"\"\" Create the app agent for the host agent. :param application_window_name: The name of the application window. :param application_root_name: The name of the application root. :param request: The user request. :param mode: The mode of the session. :return: The app agent. \"\"\" if mode == \"normal\" : agent_name = \"AppAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) app_agent : AppAgent = self . create_subagent ( agent_type = \"app\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"APPAGENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], ) elif mode == \"follower\" : # Load additional app info prompt. app_info_prompt = configs . get ( \"APP_INFO_PROMPT\" , None ) agent_name = \"FollowerAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) # Create the app agent in the follower mode. app_agent = self . create_subagent ( agent_type = \"follower\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"FOLLOWERAHENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], app_info_prompt = app_info_prompt , ) else : raise ValueError ( f \"The { mode } mode is not supported.\" ) # Create the COM receiver for the app agent. if configs . get ( \"USE_APIS\" , False ): app_agent . Puppeteer . receiver_manager . create_api_receiver ( application_root_name , application_window_name ) # Provision the context for the app agent, including the all retrievers. app_agent . context_provision ( request ) return app_agent create_puppeteer_interface () Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/host_agent.py 213 214 215 216 217 218 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( \"\" , \"\" ) create_subagent ( agent_type , agent_name , process_name , app_root_name , is_visual , main_prompt , example_prompt , api_prompt , * args , ** kwargs ) Create an SubAgent hosted by the HostAgent. Parameters: agent_type ( str ) \u2013 The type of the agent to create. agent_name ( str ) \u2013 The name of the SubAgent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: BasicAgent \u2013 The created SubAgent. Source code in agents/agent/host_agent.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def create_subagent ( self , agent_type : str , agent_name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , * args , ** kwargs , ) -> BasicAgent : \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self . agent_factory . create_agent ( agent_type , agent_name , process_name , app_root_name , is_visual , main_prompt , example_prompt , api_prompt , * args , ** kwargs , ) self . appagent_dict [ agent_name ] = app_agent app_agent . host = self self . _active_appagent = app_agent return app_agent get_active_appagent () Get the active app agent. Returns: AppAgent \u2013 The active app agent. Source code in agents/agent/host_agent.py 150 151 152 153 154 155 def get_active_appagent ( self ) -> AppAgent : \"\"\" Get the active app agent. :return: The active app agent. \"\"\" return self . _active_appagent get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: HostAgentPrompter \u2013 The prompter instance. Source code in agents/agent/host_agent.py 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> HostAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The prompter instance. \"\"\" return HostAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt ) message_constructor ( image_list , os_info , plan , prev_subtask , request ) Construct the message. Parameters: image_list ( List [ str ] ) \u2013 The list of screenshot images. os_info ( str ) \u2013 The OS information. prev_subtask ( List [ Dict [ str , str ]] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/host_agent.py 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 def message_constructor ( self , image_list : List [ str ], os_info : str , plan : List [ str ], prev_subtask : List [ Dict [ str , str ]], request : str , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :param image_list: The list of screenshot images. :param os_info: The OS information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :return: The message. \"\"\" hostagent_prompt_system_message = self . prompter . system_prompt_construction () hostagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = os_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () hostagent_prompt_user_message = ( blackboard_prompt + hostagent_prompt_user_message ) hostagent_prompt_message = self . prompter . prompt_construction ( hostagent_prompt_system_message , hostagent_prompt_user_message ) return hostagent_prompt_message print_response ( response_dict ) Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/host_agent.py 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" application = response_dict . get ( \"ControlText\" ) if not application : application = \"[The required application needs to be opened.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) bash_command = response_dict . get ( \"Bash\" , None ) subtask = response_dict . get ( \"CurrentSubtask\" ) # Convert the message from a list to a string. message = list ( response_dict . get ( \"Message\" , \"\" )) message = \" \\n \" . join ( message ) # Concatenate the subtask with the plan and convert the plan from a list to a string. plan = list ( response_dict . get ( \"Plan\" )) plan = [ subtask ] + plan plan = \" \\n \" . join ([ f \"( { i + 1 } ) \" + str ( item ) for i , item in enumerate ( plan )]) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) if bash_command : utils . print_with_color ( \"Running Bash Command\ud83d\udd27: {bash} \" . format ( bash = bash_command ), \"yellow\" ) utils . print_with_color ( \"Plans\ud83d\udcda: {plan} \" . format ( plan = plan ), \"cyan\" , ) utils . print_with_color ( \"Next Selected application\ud83d\udcf2: {application} \" . format ( application = application ), \"yellow\" , ) utils . print_with_color ( \"Messages to AppAgent\ud83d\udce9: {message} \" . format ( message = message ), \"cyan\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" ) process ( context ) Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/host_agent.py 202 203 204 205 206 207 208 209 210 211 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = HostAgentProcessor ( agent = self , context = context ) self . processor . process () # Sync the status with the processor. self . status = self . processor . status process_comfirmation () TODO: Process the confirmation. Source code in agents/agent/host_agent.py 289 290 291 292 293 def process_comfirmation ( self ) -> None : \"\"\" TODO: Process the confirmation. \"\"\" pass","title":"HostAgent"},{"location":"agents/host_agent/#hostagent","text":"The HostAgent assumes three primary responsibilities: User Engagement : The HostAgent engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary. AppAgent Management : The HostAgent manages the creation and registration of AppAgents to fulfill the user's request. It also orchestrates the interaction between the AppAgents and the application. Task Management : The HostAgent analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents . It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents to ensure the successful completion of the user's request. Bash Command Execution : The HostAgent can execute bash commands to open applications or execute system commands to support the user's request and the AppAgents ' execution. Communication : The HostAgent communicates with the AppAgents to exchange information. It also manages the Blackboard to store and share information among the agents, as shown below:","title":"HostAgent \ud83e\udd16"},{"location":"agents/host_agent/#hostagent-input","text":"The HostAgent receives the following inputs: Input Description Type User Request The user's request in natural language. String Application Information Information about the existing active applications. List of Strings Desktop Screenshots Screenshots of the desktop to provide context to the HostAgent . Image Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings Previous Plan The previous plan for the following sub-tasks. List of Strings Blackboard The shared memory space for storing and sharing information among the agents. Dictionary By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions.","title":"HostAgent Input"},{"location":"agents/host_agent/#hostagent-output","text":"With the inputs provided, the HostAgent generates the following outputs: Output Description Type Observation The observation of current desktop screenshots. String Thought The logical reasoning process of the HostAgent . String Current Sub-Task The current sub-task to be executed by the AppAgent . String Message The message to be sent to the AppAgent for the completion of the sub-task. String ControlLabel The index of the selected application to execute the sub-task. String ControlText The name of the selected application to execute the sub-task. String Plan The plan for the following sub-tasks after the current sub-task. List of Strings Status The status of the agent, mapped to the AgentState . String Comment Additional comments or information provided to the user. String Questions The questions to be asked to the user for additional information. List of Strings Bash The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. String Below is an example of the HostAgent output: { \"Observation\": \"Desktop screenshot\", \"Thought\": \"Logical reasoning process\", \"Current Sub-Task\": \"Sub-task description\", \"Message\": \"Message to AppAgent\", \"ControlLabel\": \"Application index\", \"ControlText\": \"Application name\", \"Plan\": [\"Sub-task 1\", \"Sub-task 2\"], \"Status\": \"AgentState\", \"Comment\": \"Additional comments\", \"Questions\": [\"Question 1\", \"Question 2\"], \"Bash\": \"Bash command\" } Info The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.","title":"HostAgent Output"},{"location":"agents/host_agent/#hostagent-state","text":"The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include: State Description CONTINUE The HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks. ASSIGN The HostAgent is assigning the sub-tasks to the AppAgents for execution. FINISH The overall task is completed, and the HostAgent is ready to return the results to the user. ERROR An error occurred during the processing of the user's request, and the HostAgent is unable to proceed. FAIL The HostAgent believes the task is unachievable and cannot proceed further. PENDING The HostAgent is waiting for additional information from the user to proceed. The state machine diagram for the HostAgent is shown below:","title":"HostAgent State"},{"location":"agents/host_agent/#task-decomposition","text":"Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:","title":"Task Decomposition"},{"location":"agents/host_agent/#creating-and-registering-appagents","text":"When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent , by calling the create_subagent method: def create_subagent( self, agent_type: str, agent_name: str, process_name: str, app_root_name: str, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str, *args, **kwargs, ) -> BasicAgent: \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self.agent_factory.create_agent( agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs, ) self.appagent_dict[agent_name] = app_agent app_agent.host = self self._active_appagent = app_agent return app_agent The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress.","title":"Creating and Registering AppAgents"},{"location":"agents/host_agent/#reference","text":"Bases: BasicAgent The HostAgent class the manager of AppAgents. Initialize the HostAgent. :name: The name of the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Source code in agents/agent/host_agent.py 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 def __init__ ( self , name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> None : \"\"\" Initialize the HostAgent. :name: The name of the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. \"\"\" super () . __init__ ( name = name ) self . prompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . offline_doc_retriever = None self . online_doc_retriever = None self . experience_retriever = None self . human_demonstration_retriever = None self . agent_factory = AgentFactory () self . appagent_dict = {} self . _active_appagent = None self . _blackboard = Blackboard () self . set_state ( ContinueHostAgentState ()) self . Puppeteer = self . create_puppeteer_interface ()","title":"Reference"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.blackboard","text":"Get the blackboard.","title":"blackboard"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.status_manager","text":"Get the status manager.","title":"status_manager"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.sub_agent_amount","text":"Get the amount of sub agents. Returns: int \u2013 The amount of sub agents.","title":"sub_agent_amount"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.create_app_agent","text":"Create the app agent for the host agent. Parameters: application_window_name ( str ) \u2013 The name of the application window. application_root_name ( str ) \u2013 The name of the application root. request ( str ) \u2013 The user request. mode ( str ) \u2013 The mode of the session. Returns: AppAgent \u2013 The app agent. Source code in agents/agent/host_agent.py 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 def create_app_agent ( self , application_window_name : str , application_root_name : str , request : str , mode : str , ) -> AppAgent : \"\"\" Create the app agent for the host agent. :param application_window_name: The name of the application window. :param application_root_name: The name of the application root. :param request: The user request. :param mode: The mode of the session. :return: The app agent. \"\"\" if mode == \"normal\" : agent_name = \"AppAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) app_agent : AppAgent = self . create_subagent ( agent_type = \"app\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"APPAGENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], ) elif mode == \"follower\" : # Load additional app info prompt. app_info_prompt = configs . get ( \"APP_INFO_PROMPT\" , None ) agent_name = \"FollowerAgent/ {root} / {process} \" . format ( root = application_root_name , process = application_window_name ) # Create the app agent in the follower mode. app_agent = self . create_subagent ( agent_type = \"follower\" , agent_name = agent_name , process_name = application_window_name , app_root_name = application_root_name , is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"FOLLOWERAHENT_PROMPT\" ], example_prompt = configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], api_prompt = configs [ \"API_PROMPT\" ], app_info_prompt = app_info_prompt , ) else : raise ValueError ( f \"The { mode } mode is not supported.\" ) # Create the COM receiver for the app agent. if configs . get ( \"USE_APIS\" , False ): app_agent . Puppeteer . receiver_manager . create_api_receiver ( application_root_name , application_window_name ) # Provision the context for the app agent, including the all retrievers. app_agent . context_provision ( request ) return app_agent","title":"create_app_agent"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.create_puppeteer_interface","text":"Create the Puppeteer interface to automate the app. Returns: AppPuppeteer \u2013 The Puppeteer interface. Source code in agents/agent/host_agent.py 213 214 215 216 217 218 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the Puppeteer interface to automate the app. :return: The Puppeteer interface. \"\"\" return puppeteer . AppPuppeteer ( \"\" , \"\" )","title":"create_puppeteer_interface"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.create_subagent","text":"Create an SubAgent hosted by the HostAgent. Parameters: agent_type ( str ) \u2013 The type of the agent to create. agent_name ( str ) \u2013 The name of the SubAgent. process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The root name of the app. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: BasicAgent \u2013 The created SubAgent. Source code in agents/agent/host_agent.py 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def create_subagent ( self , agent_type : str , agent_name : str , process_name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , * args , ** kwargs , ) -> BasicAgent : \"\"\" Create an SubAgent hosted by the HostAgent. :param agent_type: The type of the agent to create. :param agent_name: The name of the SubAgent. :param process_name: The process name of the app. :param app_root_name: The root name of the app. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The created SubAgent. \"\"\" app_agent = self . agent_factory . create_agent ( agent_type , agent_name , process_name , app_root_name , is_visual , main_prompt , example_prompt , api_prompt , * args , ** kwargs , ) self . appagent_dict [ agent_name ] = app_agent app_agent . host = self self . _active_appagent = app_agent return app_agent","title":"create_subagent"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.get_active_appagent","text":"Get the active app agent. Returns: AppAgent \u2013 The active app agent. Source code in agents/agent/host_agent.py 150 151 152 153 154 155 def get_active_appagent ( self ) -> AppAgent : \"\"\" Get the active app agent. :return: The active app agent. \"\"\" return self . _active_appagent","title":"get_active_appagent"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.get_prompter","text":"Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt file path. example_prompt ( str ) \u2013 The example prompt file path. api_prompt ( str ) \u2013 The API prompt file path. Returns: HostAgentPrompter \u2013 The prompter instance. Source code in agents/agent/host_agent.py 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ) -> HostAgentPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt file path. :param example_prompt: The example prompt file path. :param api_prompt: The API prompt file path. :return: The prompter instance. \"\"\" return HostAgentPrompter ( is_visual , main_prompt , example_prompt , api_prompt )","title":"get_prompter"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.message_constructor","text":"Construct the message. Parameters: image_list ( List [ str ] ) \u2013 The list of screenshot images. os_info ( str ) \u2013 The OS information. prev_subtask ( List [ Dict [ str , str ]] ) \u2013 The previous subtask. plan ( List [ str ] ) \u2013 The plan. request ( str ) \u2013 The request. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/host_agent.py 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 def message_constructor ( self , image_list : List [ str ], os_info : str , plan : List [ str ], prev_subtask : List [ Dict [ str , str ]], request : str , ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :param image_list: The list of screenshot images. :param os_info: The OS information. :param prev_subtask: The previous subtask. :param plan: The plan. :param request: The request. :return: The message. \"\"\" hostagent_prompt_system_message = self . prompter . system_prompt_construction () hostagent_prompt_user_message = self . prompter . user_content_construction ( image_list = image_list , control_item = os_info , prev_subtask = prev_subtask , prev_plan = plan , user_request = request , ) if not self . blackboard . is_empty (): blackboard_prompt = self . blackboard . blackboard_to_prompt () hostagent_prompt_user_message = ( blackboard_prompt + hostagent_prompt_user_message ) hostagent_prompt_message = self . prompter . prompt_construction ( hostagent_prompt_system_message , hostagent_prompt_user_message ) return hostagent_prompt_message","title":"message_constructor"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.print_response","text":"Print the response. Parameters: response_dict ( Dict ) \u2013 The response dictionary to print. Source code in agents/agent/host_agent.py 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 def print_response ( self , response_dict : Dict ) -> None : \"\"\" Print the response. :param response_dict: The response dictionary to print. \"\"\" application = response_dict . get ( \"ControlText\" ) if not application : application = \"[The required application needs to be opened.]\" observation = response_dict . get ( \"Observation\" ) thought = response_dict . get ( \"Thought\" ) bash_command = response_dict . get ( \"Bash\" , None ) subtask = response_dict . get ( \"CurrentSubtask\" ) # Convert the message from a list to a string. message = list ( response_dict . get ( \"Message\" , \"\" )) message = \" \\n \" . join ( message ) # Concatenate the subtask with the plan and convert the plan from a list to a string. plan = list ( response_dict . get ( \"Plan\" )) plan = [ subtask ] + plan plan = \" \\n \" . join ([ f \"( { i + 1 } ) \" + str ( item ) for i , item in enumerate ( plan )]) status = response_dict . get ( \"Status\" ) comment = response_dict . get ( \"Comment\" ) utils . print_with_color ( \"Observations\ud83d\udc40: {observation} \" . format ( observation = observation ), \"cyan\" ) utils . print_with_color ( \"Thoughts\ud83d\udca1: {thought} \" . format ( thought = thought ), \"green\" ) if bash_command : utils . print_with_color ( \"Running Bash Command\ud83d\udd27: {bash} \" . format ( bash = bash_command ), \"yellow\" ) utils . print_with_color ( \"Plans\ud83d\udcda: {plan} \" . format ( plan = plan ), \"cyan\" , ) utils . print_with_color ( \"Next Selected application\ud83d\udcf2: {application} \" . format ( application = application ), \"yellow\" , ) utils . print_with_color ( \"Messages to AppAgent\ud83d\udce9: {message} \" . format ( message = message ), \"cyan\" ) utils . print_with_color ( \"Status\ud83d\udcca: {status} \" . format ( status = status ), \"blue\" ) utils . print_with_color ( \"Comment\ud83d\udcac: {comment} \" . format ( comment = comment ), \"green\" )","title":"print_response"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.process","text":"Process the agent. Parameters: context ( Context ) \u2013 The context. Source code in agents/agent/host_agent.py 202 203 204 205 206 207 208 209 210 211 def process ( self , context : Context ) -> None : \"\"\" Process the agent. :param context: The context. \"\"\" self . processor = HostAgentProcessor ( agent = self , context = context ) self . processor . process () # Sync the status with the processor. self . status = self . processor . status","title":"process"},{"location":"agents/host_agent/#agents.agent.host_agent.HostAgent.process_comfirmation","text":"TODO: Process the confirmation. Source code in agents/agent/host_agent.py 289 290 291 292 293 def process_comfirmation ( self ) -> None : \"\"\" TODO: Process the confirmation. \"\"\" pass","title":"process_comfirmation"},{"location":"agents/overview/","text":"Agents In UFO, there are four types of agents: HostAgent , AppAgent , FollowerAgent , and EvaluationAgent . Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process: Agent Description HostAgent Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. AppAgent Executes actions on the selected application. FollowerAgent Follows the user's instructions to complete the task. EvaluationAgent Evaluates the completeness of a session or a round. In the normal workflow, only the HostAgent and AppAgent are involved in the user interaction process. The FollowerAgent and EvaluationAgent are used for specific tasks. Please see below the orchestration of the agents in UFO: Main Components An agent in UFO is composed of the following main components to fulfill its role in the UFO system: Component Description State Represents the current state of the agent and determines the next action and agent to handle the request. Memory Stores information about the user request, application state, and other relevant data. Blackboard Stores information shared between agents. Prompter Generates prompts for the language model based on the user request and application state. Processor Processes the workflow of the agent, including handling user requests, executing actions, and memory management. Reference Below is the reference for the Agent class in UFO. All agents in UFO inherit from the Agent class and implement necessary methods to fulfill their roles in the UFO system. Bases: ABC The BasicAgent class is the abstract class for the agent. Initialize the BasicAgent. Parameters: name ( str ) \u2013 The name of the agent. Source code in agents/agent/basic.py 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 def __init__ ( self , name : str ) -> None : \"\"\" Initialize the BasicAgent. :param name: The name of the agent. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = self . status_manager . CONTINUE . value self . _register_self () self . retriever_factory = retriever . RetrieverFactory () self . _memory = Memory () self . _host = None self . _processor : Optional [ BaseProcessor ] = None self . _state = None self . Puppeteer : puppeteer . AppPuppeteer = None blackboard : Blackboard property Get the blackboard. Returns: Blackboard \u2013 The blackboard. host : HostAgent property writable Get the host of the agent. Returns: HostAgent \u2013 The host of the agent. memory : Memory property Get the memory of the agent. Returns: Memory \u2013 The memory of the agent. name : str property Get the name of the agent. Returns: str \u2013 The name of the agent. processor : BaseProcessor property writable Get the processor. Returns: BaseProcessor \u2013 The processor. state : AgentState property Get the state of the agent. Returns: AgentState \u2013 The state of the agent. status : str property writable Get the status of the agent. Returns: str \u2013 The status of the agent. status_manager : AgentStatus property Get the status manager. Returns: AgentStatus \u2013 The status manager. step : int property writable Get the step of the agent. Returns: int \u2013 The step of the agent. add_memory ( memory_item ) Update the memory of the agent. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/agent/basic.py 181 182 183 184 185 186 def add_memory ( self , memory_item : MemoryItem ) -> None : \"\"\" Update the memory of the agent. :param memory_item: The memory item to add. \"\"\" self . _memory . add_memory_item ( memory_item ) build_experience_retriever () Build the experience retriever. Source code in agents/agent/basic.py 323 324 325 326 327 def build_experience_retriever ( self ) -> None : \"\"\" Build the experience retriever. \"\"\" pass build_human_demonstration_retriever () Build the human demonstration retriever. Source code in agents/agent/basic.py 329 330 331 332 333 def build_human_demonstration_retriever ( self ) -> None : \"\"\" Build the human demonstration retriever. \"\"\" pass build_offline_docs_retriever () Build the offline docs retriever. Source code in agents/agent/basic.py 311 312 313 314 315 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" pass build_online_search_retriever () Build the online search retriever. Source code in agents/agent/basic.py 317 318 319 320 321 def build_online_search_retriever ( self ) -> None : \"\"\" Build the online search retriever. \"\"\" pass clear_memory () Clear the memory of the agent. Source code in agents/agent/basic.py 195 196 197 198 199 def clear_memory ( self ) -> None : \"\"\" Clear the memory of the agent. \"\"\" self . _memory . clear () create_puppeteer_interface () Create the puppeteer interface. Source code in agents/agent/basic.py 233 234 235 236 237 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the puppeteer interface. \"\"\" pass delete_memory ( step ) Delete the memory of the agent. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/agent/basic.py 188 189 190 191 192 193 def delete_memory ( self , step : int ) -> None : \"\"\" Delete the memory of the agent. :param step: The step of the memory item to delete. \"\"\" self . _memory . delete_memory_item ( step ) get_cls ( name ) classmethod Retrieves an agent class from the registry. Parameters: name ( str ) \u2013 The name of the agent class. Returns: Type ['BasicAgent'] \u2013 The agent class. Source code in agents/agent/basic.py 350 351 352 353 354 355 356 357 @classmethod def get_cls ( cls , name : str ) -> Type [ \"BasicAgent\" ]: \"\"\" Retrieves an agent class from the registry. :param name: The name of the agent class. :return: The agent class. \"\"\" return AgentRegistry () . get_cls ( name ) get_prompter () abstractmethod Get the prompt for the agent. Returns: str \u2013 The prompt. Source code in agents/agent/basic.py 124 125 126 127 128 129 130 @abstractmethod def get_prompter ( self ) -> str : \"\"\" Get the prompt for the agent. :return: The prompt. \"\"\" pass get_response ( message , namescope , use_backup_engine , configs = configs ) classmethod Get the response for the prompt. Parameters: message ( List [ dict ] ) \u2013 The message for LLMs. namescope ( str ) \u2013 The namescope for the LLMs. use_backup_engine ( bool ) \u2013 Whether to use the backup engine. Returns: str \u2013 The response. Source code in agents/agent/basic.py 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 @classmethod def get_response ( cls , message : List [ dict ], namescope : str , use_backup_engine : bool , configs = configs ) -> str : \"\"\" Get the response for the prompt. :param message: The message for LLMs. :param namescope: The namescope for the LLMs. :param use_backup_engine: Whether to use the backup engine. :return: The response. \"\"\" response_string , cost = llm_call . get_completion ( message , namescope , use_backup_engine = use_backup_engine , configs = configs ) return response_string , cost handle ( context ) Handle the agent. Parameters: context ( Context ) \u2013 The context for the agent. Source code in agents/agent/basic.py 220 221 222 223 224 225 def handle ( self , context : Context ) -> None : \"\"\" Handle the agent. :param context: The context for the agent. \"\"\" self . state . handle ( self , context ) message_constructor () abstractmethod Construct the message. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/basic.py 132 133 134 135 136 137 138 @abstractmethod def message_constructor ( self ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :return: The message. \"\"\" pass print_response () Print the response. Source code in agents/agent/basic.py 335 336 337 338 339 def print_response ( self ) -> None : \"\"\" Print the response. \"\"\" pass process ( context ) Process the agent. Source code in agents/agent/basic.py 227 228 229 230 231 def process ( self , context : Context ) -> None : \"\"\" Process the agent. \"\"\" pass process_asker ( ask_user = True ) Ask for the process. Parameters: ask_user ( bool , default: True ) \u2013 Whether to ask the user for the questions. Source code in agents/agent/basic.py 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 def process_asker ( self , ask_user : bool = True ) -> None : \"\"\" Ask for the process. :param ask_user: Whether to ask the user for the questions. \"\"\" if self . processor : question_list = self . processor . question_list if ask_user : utils . print_with_color ( \"Could you please answer the following questions to help me understand your needs and complete the task?\" , \"yellow\" , ) for index , question in enumerate ( question_list ): if ask_user : answer = question_asker ( question , index + 1 ) if not answer . strip (): continue qa_pair = { \"question\" : question , \"answer\" : answer } utils . append_string_to_file ( configs [ \"QA_PAIR_FILE\" ], json . dumps ( qa_pair ) ) else : qa_pair = { \"question\" : question , \"answer\" : \"The answer for the question is not available, please proceed with your own knowledge or experience, or leave it as a placeholder. Do not ask the same question again.\" , } self . blackboard . add_questions ( qa_pair ) process_comfirmation () abstractmethod Confirm the process. Source code in agents/agent/basic.py 280 281 282 283 284 285 @abstractmethod def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. \"\"\" pass process_resume () Resume the process. Source code in agents/agent/basic.py 239 240 241 242 243 244 def process_resume ( self ) -> None : \"\"\" Resume the process. \"\"\" if self . processor : self . processor . resume () reflection () TODO: Reflect on the action. Source code in agents/agent/basic.py 201 202 203 204 205 206 def reflection ( self ) -> None : \"\"\" TODO: Reflect on the action. \"\"\" pass response_to_dict ( response ) staticmethod Convert the response to a dictionary. Parameters: response ( str ) \u2013 The response. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/agent/basic.py 156 157 158 159 160 161 162 163 @staticmethod def response_to_dict ( response : str ) -> Dict [ str , str ]: \"\"\" Convert the response to a dictionary. :param response: The response. :return: The dictionary. \"\"\" return utils . json_parser ( response ) set_state ( state ) Set the state of the agent. Parameters: state ( AgentState ) \u2013 The state of the agent. Source code in agents/agent/basic.py 208 209 210 211 212 213 214 215 216 217 218 def set_state ( self , state : AgentState ) -> None : \"\"\" Set the state of the agent. :param state: The state of the agent. \"\"\" assert issubclass ( type ( self ), state . agent_class () ), f \"The state is only for agent type of { state . agent_class () } , but the current agent is { type ( self ) } .\" self . _state = state","title":"Overview"},{"location":"agents/overview/#agents","text":"In UFO, there are four types of agents: HostAgent , AppAgent , FollowerAgent , and EvaluationAgent . Each agent has a specific role in the UFO system and is responsible for different aspects of the user interaction process: Agent Description HostAgent Decomposes the user request into sub-tasks and selects the appropriate application to fulfill the request. AppAgent Executes actions on the selected application. FollowerAgent Follows the user's instructions to complete the task. EvaluationAgent Evaluates the completeness of a session or a round. In the normal workflow, only the HostAgent and AppAgent are involved in the user interaction process. The FollowerAgent and EvaluationAgent are used for specific tasks. Please see below the orchestration of the agents in UFO:","title":"Agents"},{"location":"agents/overview/#main-components","text":"An agent in UFO is composed of the following main components to fulfill its role in the UFO system: Component Description State Represents the current state of the agent and determines the next action and agent to handle the request. Memory Stores information about the user request, application state, and other relevant data. Blackboard Stores information shared between agents. Prompter Generates prompts for the language model based on the user request and application state. Processor Processes the workflow of the agent, including handling user requests, executing actions, and memory management.","title":"Main Components"},{"location":"agents/overview/#reference","text":"Below is the reference for the Agent class in UFO. All agents in UFO inherit from the Agent class and implement necessary methods to fulfill their roles in the UFO system. Bases: ABC The BasicAgent class is the abstract class for the agent. Initialize the BasicAgent. Parameters: name ( str ) \u2013 The name of the agent. Source code in agents/agent/basic.py 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 def __init__ ( self , name : str ) -> None : \"\"\" Initialize the BasicAgent. :param name: The name of the agent. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = self . status_manager . CONTINUE . value self . _register_self () self . retriever_factory = retriever . RetrieverFactory () self . _memory = Memory () self . _host = None self . _processor : Optional [ BaseProcessor ] = None self . _state = None self . Puppeteer : puppeteer . AppPuppeteer = None","title":"Reference"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.blackboard","text":"Get the blackboard. Returns: Blackboard \u2013 The blackboard.","title":"blackboard"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.host","text":"Get the host of the agent. Returns: HostAgent \u2013 The host of the agent.","title":"host"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.memory","text":"Get the memory of the agent. Returns: Memory \u2013 The memory of the agent.","title":"memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.name","text":"Get the name of the agent. Returns: str \u2013 The name of the agent.","title":"name"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.processor","text":"Get the processor. Returns: BaseProcessor \u2013 The processor.","title":"processor"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.state","text":"Get the state of the agent. Returns: AgentState \u2013 The state of the agent.","title":"state"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.status","text":"Get the status of the agent. Returns: str \u2013 The status of the agent.","title":"status"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.status_manager","text":"Get the status manager. Returns: AgentStatus \u2013 The status manager.","title":"status_manager"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.step","text":"Get the step of the agent. Returns: int \u2013 The step of the agent.","title":"step"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.add_memory","text":"Update the memory of the agent. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/agent/basic.py 181 182 183 184 185 186 def add_memory ( self , memory_item : MemoryItem ) -> None : \"\"\" Update the memory of the agent. :param memory_item: The memory item to add. \"\"\" self . _memory . add_memory_item ( memory_item )","title":"add_memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_experience_retriever","text":"Build the experience retriever. Source code in agents/agent/basic.py 323 324 325 326 327 def build_experience_retriever ( self ) -> None : \"\"\" Build the experience retriever. \"\"\" pass","title":"build_experience_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_human_demonstration_retriever","text":"Build the human demonstration retriever. Source code in agents/agent/basic.py 329 330 331 332 333 def build_human_demonstration_retriever ( self ) -> None : \"\"\" Build the human demonstration retriever. \"\"\" pass","title":"build_human_demonstration_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_offline_docs_retriever","text":"Build the offline docs retriever. Source code in agents/agent/basic.py 311 312 313 314 315 def build_offline_docs_retriever ( self ) -> None : \"\"\" Build the offline docs retriever. \"\"\" pass","title":"build_offline_docs_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.build_online_search_retriever","text":"Build the online search retriever. Source code in agents/agent/basic.py 317 318 319 320 321 def build_online_search_retriever ( self ) -> None : \"\"\" Build the online search retriever. \"\"\" pass","title":"build_online_search_retriever"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.clear_memory","text":"Clear the memory of the agent. Source code in agents/agent/basic.py 195 196 197 198 199 def clear_memory ( self ) -> None : \"\"\" Clear the memory of the agent. \"\"\" self . _memory . clear ()","title":"clear_memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.create_puppeteer_interface","text":"Create the puppeteer interface. Source code in agents/agent/basic.py 233 234 235 236 237 def create_puppeteer_interface ( self ) -> puppeteer . AppPuppeteer : \"\"\" Create the puppeteer interface. \"\"\" pass","title":"create_puppeteer_interface"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.delete_memory","text":"Delete the memory of the agent. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/agent/basic.py 188 189 190 191 192 193 def delete_memory ( self , step : int ) -> None : \"\"\" Delete the memory of the agent. :param step: The step of the memory item to delete. \"\"\" self . _memory . delete_memory_item ( step )","title":"delete_memory"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.get_cls","text":"Retrieves an agent class from the registry. Parameters: name ( str ) \u2013 The name of the agent class. Returns: Type ['BasicAgent'] \u2013 The agent class. Source code in agents/agent/basic.py 350 351 352 353 354 355 356 357 @classmethod def get_cls ( cls , name : str ) -> Type [ \"BasicAgent\" ]: \"\"\" Retrieves an agent class from the registry. :param name: The name of the agent class. :return: The agent class. \"\"\" return AgentRegistry () . get_cls ( name )","title":"get_cls"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.get_prompter","text":"Get the prompt for the agent. Returns: str \u2013 The prompt. Source code in agents/agent/basic.py 124 125 126 127 128 129 130 @abstractmethod def get_prompter ( self ) -> str : \"\"\" Get the prompt for the agent. :return: The prompt. \"\"\" pass","title":"get_prompter"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.get_response","text":"Get the response for the prompt. Parameters: message ( List [ dict ] ) \u2013 The message for LLMs. namescope ( str ) \u2013 The namescope for the LLMs. use_backup_engine ( bool ) \u2013 Whether to use the backup engine. Returns: str \u2013 The response. Source code in agents/agent/basic.py 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 @classmethod def get_response ( cls , message : List [ dict ], namescope : str , use_backup_engine : bool , configs = configs ) -> str : \"\"\" Get the response for the prompt. :param message: The message for LLMs. :param namescope: The namescope for the LLMs. :param use_backup_engine: Whether to use the backup engine. :return: The response. \"\"\" response_string , cost = llm_call . get_completion ( message , namescope , use_backup_engine = use_backup_engine , configs = configs ) return response_string , cost","title":"get_response"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.handle","text":"Handle the agent. Parameters: context ( Context ) \u2013 The context for the agent. Source code in agents/agent/basic.py 220 221 222 223 224 225 def handle ( self , context : Context ) -> None : \"\"\" Handle the agent. :param context: The context for the agent. \"\"\" self . state . handle ( self , context )","title":"handle"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.message_constructor","text":"Construct the message. Returns: List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]] \u2013 The message. Source code in agents/agent/basic.py 132 133 134 135 136 137 138 @abstractmethod def message_constructor ( self ) -> List [ Dict [ str , Union [ str , List [ Dict [ str , str ]]]]]: \"\"\" Construct the message. :return: The message. \"\"\" pass","title":"message_constructor"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.print_response","text":"Print the response. Source code in agents/agent/basic.py 335 336 337 338 339 def print_response ( self ) -> None : \"\"\" Print the response. \"\"\" pass","title":"print_response"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process","text":"Process the agent. Source code in agents/agent/basic.py 227 228 229 230 231 def process ( self , context : Context ) -> None : \"\"\" Process the agent. \"\"\" pass","title":"process"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process_asker","text":"Ask for the process. Parameters: ask_user ( bool , default: True ) \u2013 Whether to ask the user for the questions. Source code in agents/agent/basic.py 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 def process_asker ( self , ask_user : bool = True ) -> None : \"\"\" Ask for the process. :param ask_user: Whether to ask the user for the questions. \"\"\" if self . processor : question_list = self . processor . question_list if ask_user : utils . print_with_color ( \"Could you please answer the following questions to help me understand your needs and complete the task?\" , \"yellow\" , ) for index , question in enumerate ( question_list ): if ask_user : answer = question_asker ( question , index + 1 ) if not answer . strip (): continue qa_pair = { \"question\" : question , \"answer\" : answer } utils . append_string_to_file ( configs [ \"QA_PAIR_FILE\" ], json . dumps ( qa_pair ) ) else : qa_pair = { \"question\" : question , \"answer\" : \"The answer for the question is not available, please proceed with your own knowledge or experience, or leave it as a placeholder. Do not ask the same question again.\" , } self . blackboard . add_questions ( qa_pair )","title":"process_asker"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process_comfirmation","text":"Confirm the process. Source code in agents/agent/basic.py 280 281 282 283 284 285 @abstractmethod def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. \"\"\" pass","title":"process_comfirmation"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.process_resume","text":"Resume the process. Source code in agents/agent/basic.py 239 240 241 242 243 244 def process_resume ( self ) -> None : \"\"\" Resume the process. \"\"\" if self . processor : self . processor . resume ()","title":"process_resume"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.reflection","text":"TODO: Reflect on the action. Source code in agents/agent/basic.py 201 202 203 204 205 206 def reflection ( self ) -> None : \"\"\" TODO: Reflect on the action. \"\"\" pass","title":"reflection"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.response_to_dict","text":"Convert the response to a dictionary. Parameters: response ( str ) \u2013 The response. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/agent/basic.py 156 157 158 159 160 161 162 163 @staticmethod def response_to_dict ( response : str ) -> Dict [ str , str ]: \"\"\" Convert the response to a dictionary. :param response: The response. :return: The dictionary. \"\"\" return utils . json_parser ( response )","title":"response_to_dict"},{"location":"agents/overview/#agents.agent.basic.BasicAgent.set_state","text":"Set the state of the agent. Parameters: state ( AgentState ) \u2013 The state of the agent. Source code in agents/agent/basic.py 208 209 210 211 212 213 214 215 216 217 218 def set_state ( self , state : AgentState ) -> None : \"\"\" Set the state of the agent. :param state: The state of the agent. \"\"\" assert issubclass ( type ( self ), state . agent_class () ), f \"The state is only for agent type of { state . agent_class () } , but the current agent is { type ( self ) } .\" self . _state = state","title":"set_state"},{"location":"agents/design/blackboard/","text":"Agent Blackboard The Blackboard is a shared memory space that is visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The Blackboard is a key component of the UFO framework, enabling agents to share information and collaborate to fulfill user requests. The Blackboard is implemented as a class in the ufo/agents/memory/blackboard.py file. Components The Blackboard consists of the following data components: Component Description questions A list of questions that UFO asks the user, along with their corresponding answers. requests A list of historical user requests received in previous Round . trajectories A list of step-wise trajectories that record the agent's actions and decisions at each step. screenshots A list of screenshots taken by the agent when it believes the current state is important for future reference. Tip The keys stored in the trajectories are configured as HISTORY_KEYS in the config_dev.yaml file. You can customize the keys based on your requirements and the agent's logic. Tip Whether to save the screenshots is determined by the AppAgent . You can enable or disable screenshot capture by setting the SCREENSHOT_TO_MEMORY flag in the config_dev.yaml file. Blackboard to Prompt Data in the Blackboard is based on the MemoryItem class. It has a method blackboard_to_prompt that converts the information stored in the Blackboard to a string prompt. Agents call this method to construct the prompt for the LLM's inference. The blackboard_to_prompt method is defined as follows: def blackboard_to_prompt(self) -> List[str]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\": \"text\", \"text\": \"[Blackboard:]\", } ] blackboard_prompt = ( prefix + self.texts_to_prompt(self.questions, \"[Questions & Answers:]\") + self.texts_to_prompt(self.requests, \"[Request History:]\") + self.texts_to_prompt(self.trajectories, \"[Step Trajectories Completed Previously:]\") + self.screenshots_to_prompt() ) return blackboard_prompt Reference Class for the blackboard, which stores the data and images which are visible to all the agents. Initialize the blackboard. Source code in agents/memory/blackboard.py 41 42 43 44 45 46 47 48 49 50 51 52 53 def __init__ ( self ) -> None : \"\"\" Initialize the blackboard. \"\"\" self . _questions : Memory = Memory () self . _requests : Memory = Memory () self . _trajectories : Memory = Memory () self . _screenshots : Memory = Memory () if configs . get ( \"USE_CUSTOMIZATION\" , False ): self . load_questions ( configs . get ( \"QA_PAIR_FILE\" , \"\" ), configs . get ( \"QA_PAIR_NUM\" , - 1 ) ) questions : Memory property Get the data from the blackboard. Returns: Memory \u2013 The questions from the blackboard. requests : Memory property Get the data from the blackboard. Returns: Memory \u2013 The requests from the blackboard. screenshots : Memory property Get the images from the blackboard. Returns: Memory \u2013 The images from the blackboard. trajectories : Memory property Get the data from the blackboard. Returns: Memory \u2013 The trajectories from the blackboard. add_data ( data , memory ) Add the data to the a memory in the blackboard. Parameters: data ( Union [ MemoryItem , Dict [ str , str ], str ] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. memory ( Memory ) \u2013 The memory to add the data to. Source code in agents/memory/blackboard.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 def add_data ( self , data : Union [ MemoryItem , Dict [ str , str ], str ], memory : Memory ) -> None : \"\"\" Add the data to the a memory in the blackboard. :param data: The data to be added. It can be a dictionary or a MemoryItem or a string. :param memory: The memory to add the data to. \"\"\" if isinstance ( data , dict ): data_memory = MemoryItem () data_memory . add_values_from_dict ( data ) memory . add_memory_item ( data_memory ) elif isinstance ( data , MemoryItem ): memory . add_memory_item ( data ) elif isinstance ( data , str ): data_memory = MemoryItem () data_memory . add_values_from_dict ({ \"text\" : data }) memory . add_memory_item ( data_memory ) add_image ( screenshot_path = '' , metadata = None ) Add the image to the blackboard. Parameters: screenshot_path ( str , default: '' ) \u2013 The path of the image. metadata ( Optional [ Dict [ str , str ]] , default: None ) \u2013 The metadata of the image. Source code in agents/memory/blackboard.py 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 def add_image ( self , screenshot_path : str = \"\" , metadata : Optional [ Dict [ str , str ]] = None , ) -> None : \"\"\" Add the image to the blackboard. :param screenshot_path: The path of the image. :param metadata: The metadata of the image. \"\"\" if os . path . exists ( screenshot_path ): screenshot_str = PhotographerFacade () . encode_image_from_path ( screenshot_path ) else : print ( f \"Screenshot path { screenshot_path } does not exist.\" ) screenshot_str = \"\" image_memory_item = ImageMemoryItem () image_memory_item . add_values_from_dict ( { ImageMemoryItemNames . METADATA : metadata . get ( ImageMemoryItemNames . METADATA ), ImageMemoryItemNames . IMAGE_PATH : screenshot_path , ImageMemoryItemNames . IMAGE_STR : screenshot_str , } ) self . screenshots . add_memory_item ( image_memory_item ) add_questions ( questions ) Add the data to the blackboard. Parameters: questions ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 107 108 109 110 111 112 113 def add_questions ( self , questions : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param questions: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( questions , self . questions ) add_requests ( requests ) Add the data to the blackboard. Parameters: requests ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 115 116 117 118 119 120 121 def add_requests ( self , requests : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param requests: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( requests , self . requests ) add_trajectories ( trajectories ) Add the data to the blackboard. Parameters: trajectories ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 123 124 125 126 127 128 129 def add_trajectories ( self , trajectories : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param trajectories: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( trajectories , self . trajectories ) blackboard_to_prompt () Convert the blackboard to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def blackboard_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\" : \"text\" , \"text\" : \"[Blackboard:]\" , } ] blackboard_prompt = ( prefix + self . texts_to_prompt ( self . questions , \"[Questions & Answers:]\" ) + self . texts_to_prompt ( self . requests , \"[Request History:]\" ) + self . texts_to_prompt ( self . trajectories , \"[Step Trajectories Completed Previously:]\" ) + self . screenshots_to_prompt () ) return blackboard_prompt clear () Clear the blackboard. Source code in agents/memory/blackboard.py 277 278 279 280 281 282 283 284 def clear ( self ) -> None : \"\"\" Clear the blackboard. \"\"\" self . questions . clear () self . requests . clear () self . trajectories . clear () self . screenshots . clear () is_empty () Check if the blackboard is empty. Returns: bool \u2013 True if the blackboard is empty, False otherwise. Source code in agents/memory/blackboard.py 265 266 267 268 269 270 271 272 273 274 275 def is_empty ( self ) -> bool : \"\"\" Check if the blackboard is empty. :return: True if the blackboard is empty, False otherwise. \"\"\" return ( self . questions . is_empty () and self . requests . is_empty () and self . trajectories . is_empty () and self . screenshots . is_empty () ) load_questions ( file_path , last_k =- 1 ) Load the data from a file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Source code in agents/memory/blackboard.py 192 193 194 195 196 197 198 199 200 def load_questions ( self , file_path : str , last_k =- 1 ) -> None : \"\"\" Load the data from a file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. \"\"\" qa_list = self . read_json_file ( file_path , last_k ) for qa in qa_list : self . add_questions ( qa ) questions_to_json () Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 164 165 166 167 168 169 def questions_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . questions . to_json () read_json_file ( file_path , last_k =- 1 ) staticmethod Read the json file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Returns: Dict [ str , str ] \u2013 The data in the file. Source code in agents/memory/blackboard.py 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 @staticmethod def read_json_file ( file_path : str , last_k =- 1 ) -> Dict [ str , str ]: \"\"\" Read the json file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. :return: The data in the file. \"\"\" data_list = [] # Check if the file exists if os . path . exists ( file_path ): # Open the file and read the lines with open ( file_path , \"r\" , encoding = \"utf-8\" ) as file : lines = file . readlines () # If last_k is not -1, only read the last k lines if last_k != - 1 : lines = lines [ - last_k :] # Parse the lines as JSON for line in lines : try : data = json . loads ( line . strip ()) data_list . append ( data ) except json . JSONDecodeError : print ( f \"Warning: Unable to parse line as JSON: { line } \" ) return data_list requests_to_json () Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 171 172 173 174 175 176 def requests_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . requests . to_json () screenshots_to_json () Convert the images to a dictionary. Returns: str \u2013 The images in the dictionary format. Source code in agents/memory/blackboard.py 185 186 187 188 189 190 def screenshots_to_json ( self ) -> str : \"\"\" Convert the images to a dictionary. :return: The images in the dictionary format. \"\"\" return self . screenshots . to_json () screenshots_to_prompt () Convert the images to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 def screenshots_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the images to a prompt. :return: The prompt. \"\"\" user_content = [] for screenshot_dict in self . screenshots . list_content : user_content . append ( { \"type\" : \"text\" , \"text\" : json . dumps ( screenshot_dict . get ( ImageMemoryItemNames . METADATA , \"\" ) ), } ) user_content . append ( { \"type\" : \"image_url\" , \"image_url\" : { \"url\" : screenshot_dict . get ( ImageMemoryItemNames . IMAGE_STR , \"\" ) }, } ) return user_content texts_to_prompt ( memory , prefix ) Convert the data to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 202 203 204 205 206 207 208 209 210 211 212 def texts_to_prompt ( self , memory : Memory , prefix : str ) -> List [ str ]: \"\"\" Convert the data to a prompt. :return: The prompt. \"\"\" user_content = [ { \"type\" : \"text\" , \"text\" : f \" { prefix } \\n { json . dumps ( memory . list_content ) } \" } ] return user_content trajectories_to_json () Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 178 179 180 181 182 183 def trajectories_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . trajectories . to_json () Note You can customize the class to tailor the Blackboard to your requirements.","title":"Blackboard"},{"location":"agents/design/blackboard/#agent-blackboard","text":"The Blackboard is a shared memory space that is visible to all agents in the UFO framework. It stores information required for agents to interact with the user and applications at every step. The Blackboard is a key component of the UFO framework, enabling agents to share information and collaborate to fulfill user requests. The Blackboard is implemented as a class in the ufo/agents/memory/blackboard.py file.","title":"Agent Blackboard"},{"location":"agents/design/blackboard/#components","text":"The Blackboard consists of the following data components: Component Description questions A list of questions that UFO asks the user, along with their corresponding answers. requests A list of historical user requests received in previous Round . trajectories A list of step-wise trajectories that record the agent's actions and decisions at each step. screenshots A list of screenshots taken by the agent when it believes the current state is important for future reference. Tip The keys stored in the trajectories are configured as HISTORY_KEYS in the config_dev.yaml file. You can customize the keys based on your requirements and the agent's logic. Tip Whether to save the screenshots is determined by the AppAgent . You can enable or disable screenshot capture by setting the SCREENSHOT_TO_MEMORY flag in the config_dev.yaml file.","title":"Components"},{"location":"agents/design/blackboard/#blackboard-to-prompt","text":"Data in the Blackboard is based on the MemoryItem class. It has a method blackboard_to_prompt that converts the information stored in the Blackboard to a string prompt. Agents call this method to construct the prompt for the LLM's inference. The blackboard_to_prompt method is defined as follows: def blackboard_to_prompt(self) -> List[str]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\": \"text\", \"text\": \"[Blackboard:]\", } ] blackboard_prompt = ( prefix + self.texts_to_prompt(self.questions, \"[Questions & Answers:]\") + self.texts_to_prompt(self.requests, \"[Request History:]\") + self.texts_to_prompt(self.trajectories, \"[Step Trajectories Completed Previously:]\") + self.screenshots_to_prompt() ) return blackboard_prompt","title":"Blackboard to Prompt"},{"location":"agents/design/blackboard/#reference","text":"Class for the blackboard, which stores the data and images which are visible to all the agents. Initialize the blackboard. Source code in agents/memory/blackboard.py 41 42 43 44 45 46 47 48 49 50 51 52 53 def __init__ ( self ) -> None : \"\"\" Initialize the blackboard. \"\"\" self . _questions : Memory = Memory () self . _requests : Memory = Memory () self . _trajectories : Memory = Memory () self . _screenshots : Memory = Memory () if configs . get ( \"USE_CUSTOMIZATION\" , False ): self . load_questions ( configs . get ( \"QA_PAIR_FILE\" , \"\" ), configs . get ( \"QA_PAIR_NUM\" , - 1 ) )","title":"Reference"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.questions","text":"Get the data from the blackboard. Returns: Memory \u2013 The questions from the blackboard.","title":"questions"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.requests","text":"Get the data from the blackboard. Returns: Memory \u2013 The requests from the blackboard.","title":"requests"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.screenshots","text":"Get the images from the blackboard. Returns: Memory \u2013 The images from the blackboard.","title":"screenshots"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.trajectories","text":"Get the data from the blackboard. Returns: Memory \u2013 The trajectories from the blackboard.","title":"trajectories"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_data","text":"Add the data to the a memory in the blackboard. Parameters: data ( Union [ MemoryItem , Dict [ str , str ], str ] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. memory ( Memory ) \u2013 The memory to add the data to. Source code in agents/memory/blackboard.py 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 def add_data ( self , data : Union [ MemoryItem , Dict [ str , str ], str ], memory : Memory ) -> None : \"\"\" Add the data to the a memory in the blackboard. :param data: The data to be added. It can be a dictionary or a MemoryItem or a string. :param memory: The memory to add the data to. \"\"\" if isinstance ( data , dict ): data_memory = MemoryItem () data_memory . add_values_from_dict ( data ) memory . add_memory_item ( data_memory ) elif isinstance ( data , MemoryItem ): memory . add_memory_item ( data ) elif isinstance ( data , str ): data_memory = MemoryItem () data_memory . add_values_from_dict ({ \"text\" : data }) memory . add_memory_item ( data_memory )","title":"add_data"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_image","text":"Add the image to the blackboard. Parameters: screenshot_path ( str , default: '' ) \u2013 The path of the image. metadata ( Optional [ Dict [ str , str ]] , default: None ) \u2013 The metadata of the image. Source code in agents/memory/blackboard.py 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 def add_image ( self , screenshot_path : str = \"\" , metadata : Optional [ Dict [ str , str ]] = None , ) -> None : \"\"\" Add the image to the blackboard. :param screenshot_path: The path of the image. :param metadata: The metadata of the image. \"\"\" if os . path . exists ( screenshot_path ): screenshot_str = PhotographerFacade () . encode_image_from_path ( screenshot_path ) else : print ( f \"Screenshot path { screenshot_path } does not exist.\" ) screenshot_str = \"\" image_memory_item = ImageMemoryItem () image_memory_item . add_values_from_dict ( { ImageMemoryItemNames . METADATA : metadata . get ( ImageMemoryItemNames . METADATA ), ImageMemoryItemNames . IMAGE_PATH : screenshot_path , ImageMemoryItemNames . IMAGE_STR : screenshot_str , } ) self . screenshots . add_memory_item ( image_memory_item )","title":"add_image"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_questions","text":"Add the data to the blackboard. Parameters: questions ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 107 108 109 110 111 112 113 def add_questions ( self , questions : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param questions: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( questions , self . questions )","title":"add_questions"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_requests","text":"Add the data to the blackboard. Parameters: requests ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 115 116 117 118 119 120 121 def add_requests ( self , requests : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param requests: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( requests , self . requests )","title":"add_requests"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.add_trajectories","text":"Add the data to the blackboard. Parameters: trajectories ( Union [ MemoryItem , Dict [ str , str ]] ) \u2013 The data to be added. It can be a dictionary or a MemoryItem or a string. Source code in agents/memory/blackboard.py 123 124 125 126 127 128 129 def add_trajectories ( self , trajectories : Union [ MemoryItem , Dict [ str , str ]]) -> None : \"\"\" Add the data to the blackboard. :param trajectories: The data to be added. It can be a dictionary or a MemoryItem or a string. \"\"\" self . add_data ( trajectories , self . trajectories )","title":"add_trajectories"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.blackboard_to_prompt","text":"Convert the blackboard to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def blackboard_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the blackboard to a prompt. :return: The prompt. \"\"\" prefix = [ { \"type\" : \"text\" , \"text\" : \"[Blackboard:]\" , } ] blackboard_prompt = ( prefix + self . texts_to_prompt ( self . questions , \"[Questions & Answers:]\" ) + self . texts_to_prompt ( self . requests , \"[Request History:]\" ) + self . texts_to_prompt ( self . trajectories , \"[Step Trajectories Completed Previously:]\" ) + self . screenshots_to_prompt () ) return blackboard_prompt","title":"blackboard_to_prompt"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.clear","text":"Clear the blackboard. Source code in agents/memory/blackboard.py 277 278 279 280 281 282 283 284 def clear ( self ) -> None : \"\"\" Clear the blackboard. \"\"\" self . questions . clear () self . requests . clear () self . trajectories . clear () self . screenshots . clear ()","title":"clear"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.is_empty","text":"Check if the blackboard is empty. Returns: bool \u2013 True if the blackboard is empty, False otherwise. Source code in agents/memory/blackboard.py 265 266 267 268 269 270 271 272 273 274 275 def is_empty ( self ) -> bool : \"\"\" Check if the blackboard is empty. :return: True if the blackboard is empty, False otherwise. \"\"\" return ( self . questions . is_empty () and self . requests . is_empty () and self . trajectories . is_empty () and self . screenshots . is_empty () )","title":"is_empty"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.load_questions","text":"Load the data from a file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Source code in agents/memory/blackboard.py 192 193 194 195 196 197 198 199 200 def load_questions ( self , file_path : str , last_k =- 1 ) -> None : \"\"\" Load the data from a file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. \"\"\" qa_list = self . read_json_file ( file_path , last_k ) for qa in qa_list : self . add_questions ( qa )","title":"load_questions"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.questions_to_json","text":"Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 164 165 166 167 168 169 def questions_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . questions . to_json ()","title":"questions_to_json"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.read_json_file","text":"Read the json file. Parameters: file_path ( str ) \u2013 The path of the file. last_k \u2013 The number of lines to read from the end of the file. If -1, read all lines. Returns: Dict [ str , str ] \u2013 The data in the file. Source code in agents/memory/blackboard.py 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 @staticmethod def read_json_file ( file_path : str , last_k =- 1 ) -> Dict [ str , str ]: \"\"\" Read the json file. :param file_path: The path of the file. :param last_k: The number of lines to read from the end of the file. If -1, read all lines. :return: The data in the file. \"\"\" data_list = [] # Check if the file exists if os . path . exists ( file_path ): # Open the file and read the lines with open ( file_path , \"r\" , encoding = \"utf-8\" ) as file : lines = file . readlines () # If last_k is not -1, only read the last k lines if last_k != - 1 : lines = lines [ - last_k :] # Parse the lines as JSON for line in lines : try : data = json . loads ( line . strip ()) data_list . append ( data ) except json . JSONDecodeError : print ( f \"Warning: Unable to parse line as JSON: { line } \" ) return data_list","title":"read_json_file"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.requests_to_json","text":"Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 171 172 173 174 175 176 def requests_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . requests . to_json ()","title":"requests_to_json"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.screenshots_to_json","text":"Convert the images to a dictionary. Returns: str \u2013 The images in the dictionary format. Source code in agents/memory/blackboard.py 185 186 187 188 189 190 def screenshots_to_json ( self ) -> str : \"\"\" Convert the images to a dictionary. :return: The images in the dictionary format. \"\"\" return self . screenshots . to_json ()","title":"screenshots_to_json"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.screenshots_to_prompt","text":"Convert the images to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 def screenshots_to_prompt ( self ) -> List [ str ]: \"\"\" Convert the images to a prompt. :return: The prompt. \"\"\" user_content = [] for screenshot_dict in self . screenshots . list_content : user_content . append ( { \"type\" : \"text\" , \"text\" : json . dumps ( screenshot_dict . get ( ImageMemoryItemNames . METADATA , \"\" ) ), } ) user_content . append ( { \"type\" : \"image_url\" , \"image_url\" : { \"url\" : screenshot_dict . get ( ImageMemoryItemNames . IMAGE_STR , \"\" ) }, } ) return user_content","title":"screenshots_to_prompt"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.texts_to_prompt","text":"Convert the data to a prompt. Returns: List [ str ] \u2013 The prompt. Source code in agents/memory/blackboard.py 202 203 204 205 206 207 208 209 210 211 212 def texts_to_prompt ( self , memory : Memory , prefix : str ) -> List [ str ]: \"\"\" Convert the data to a prompt. :return: The prompt. \"\"\" user_content = [ { \"type\" : \"text\" , \"text\" : f \" { prefix } \\n { json . dumps ( memory . list_content ) } \" } ] return user_content","title":"texts_to_prompt"},{"location":"agents/design/blackboard/#agents.memory.blackboard.Blackboard.trajectories_to_json","text":"Convert the data to a dictionary. Returns: str \u2013 The data in the dictionary format. Source code in agents/memory/blackboard.py 178 179 180 181 182 183 def trajectories_to_json ( self ) -> str : \"\"\" Convert the data to a dictionary. :return: The data in the dictionary format. \"\"\" return self . trajectories . to_json () Note You can customize the class to tailor the Blackboard to your requirements.","title":"trajectories_to_json"},{"location":"agents/design/memory/","text":"Agent Memory The Memory manages the memory of the agent and stores the information required for the agent to interact with the user and applications at every step. Parts of elements in the Memory will be visible to the agent for decision-making. MemoryItem A MemoryItem is a dataclass that represents a single step in the agent's memory. The fields of a MemoryItem is flexible and can be customized based on the requirements of the agent. The MemoryItem class is defined as follows: This data class represents a memory item of an agent at one step. attributes : List [ str ] property Get the attributes of the memory item. Returns: List [ str ] \u2013 The attributes. add_values_from_dict ( values ) Add fields to the memory item. Parameters: values ( Dict [ str , Any ] ) \u2013 The values of the fields. Source code in agents/memory/memory.py 57 58 59 60 61 62 63 def add_values_from_dict ( self , values : Dict [ str , Any ]) -> None : \"\"\" Add fields to the memory item. :param values: The values of the fields. \"\"\" for key , value in values . items (): self . set_value ( key , value ) filter ( keys = []) Fetch the memory item. Parameters: keys ( List [ str ] , default: [] ) \u2013 The keys to fetch. Returns: None \u2013 The filtered memory item. Source code in agents/memory/memory.py 37 38 39 40 41 42 43 44 def filter ( self , keys : List [ str ] = []) -> None : \"\"\" Fetch the memory item. :param keys: The keys to fetch. :return: The filtered memory item. \"\"\" return { key : value for key , value in self . to_dict () . items () if key in keys } get_value ( key ) Get the value of the field. Parameters: key ( str ) \u2013 The key of the field. Returns: Optional [ str ] \u2013 The value of the field. Source code in agents/memory/memory.py 65 66 67 68 69 70 71 72 def get_value ( self , key : str ) -> Optional [ str ]: \"\"\" Get the value of the field. :param key: The key of the field. :return: The value of the field. \"\"\" return getattr ( self , key , None ) get_values ( keys ) Get the values of the fields. Parameters: keys ( List [ str ] ) \u2013 The keys of the fields. Returns: dict \u2013 The values of the fields. Source code in agents/memory/memory.py 74 75 76 77 78 79 80 def get_values ( self , keys : List [ str ]) -> dict : \"\"\" Get the values of the fields. :param keys: The keys of the fields. :return: The values of the fields. \"\"\" return { key : self . get_value ( key ) for key in keys } set_value ( key , value ) Add a field to the memory item. Parameters: key ( str ) \u2013 The key of the field. value ( str ) \u2013 The value of the field. Source code in agents/memory/memory.py 46 47 48 49 50 51 52 53 54 55 def set_value ( self , key : str , value : str ) -> None : \"\"\" Add a field to the memory item. :param key: The key of the field. :param value: The value of the field. \"\"\" setattr ( self , key , value ) if key not in self . _memory_attributes : self . _memory_attributes . append ( key ) to_dict () Convert the MemoryItem to a dictionary. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/memory/memory.py 19 20 21 22 23 24 25 26 27 28 def to_dict ( self ) -> Dict [ str , str ]: \"\"\" Convert the MemoryItem to a dictionary. :return: The dictionary. \"\"\" return { key : value for key , value in self . __dict__ . items () if key in self . _memory_attributes } to_json () Convert the memory item to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 30 31 32 33 34 35 def to_json ( self ) -> str : \"\"\" Convert the memory item to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( self . to_dict ()) Info At each step, an instance of MemoryItem is created and stored in the Memory to record the information of the agent's interaction with the user and applications. Memory The Memory class is responsible for managing the memory of the agent. It stores a list of MemoryItem instances that represent the agent's memory at each step. The Memory class is defined as follows: This data class represents a memory of an agent. content : List [ MemoryItem ] property Get the content of the memory. Returns: List [ MemoryItem ] \u2013 The content of the memory. length : int property Get the length of the memory. Returns: int \u2013 The length of the memory. list_content : List [ Dict [ str , str ]] property List the content of the memory. Returns: List [ Dict [ str , str ]] \u2013 The content of the memory. add_memory_item ( memory_item ) Add a memory item to the memory. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/memory/memory.py 122 123 124 125 126 127 def add_memory_item ( self , memory_item : MemoryItem ) -> None : \"\"\" Add a memory item to the memory. :param memory_item: The memory item to add. \"\"\" self . _content . append ( memory_item ) clear () Clear the memory. Source code in agents/memory/memory.py 129 130 131 132 133 def clear ( self ) -> None : \"\"\" Clear the memory. \"\"\" self . _content = [] delete_memory_item ( step ) Delete a memory item from the memory. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/memory/memory.py 143 144 145 146 147 148 def delete_memory_item ( self , step : int ) -> None : \"\"\" Delete a memory item from the memory. :param step: The step of the memory item to delete. \"\"\" self . _content = [ item for item in self . _content if item . step != step ] filter_memory_from_keys ( keys ) Filter the memory from the keys. If an item does not have the key, the key will be ignored. Parameters: keys ( List [ str ] ) \u2013 The keys to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 114 115 116 117 118 119 120 def filter_memory_from_keys ( self , keys : List [ str ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the keys. If an item does not have the key, the key will be ignored. :param keys: The keys to filter. :return: The filtered memory. \"\"\" return [ item . filter ( keys ) for item in self . _content ] filter_memory_from_steps ( steps ) Filter the memory from the steps. Parameters: steps ( List [ int ] ) \u2013 The steps to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 106 107 108 109 110 111 112 def filter_memory_from_steps ( self , steps : List [ int ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the steps. :param steps: The steps to filter. :return: The filtered memory. \"\"\" return [ item . to_dict () for item in self . _content if item . step in steps ] get_latest_item () Get the latest memory item. Returns: MemoryItem \u2013 The latest memory item. Source code in agents/memory/memory.py 160 161 162 163 164 165 166 167 def get_latest_item ( self ) -> MemoryItem : \"\"\" Get the latest memory item. :return: The latest memory item. \"\"\" if self . length == 0 : return None return self . _content [ - 1 ] is_empty () Check if the memory is empty. Returns: bool \u2013 The boolean value indicating if the memory is empty. Source code in agents/memory/memory.py 185 186 187 188 189 190 def is_empty ( self ) -> bool : \"\"\" Check if the memory is empty. :return: The boolean value indicating if the memory is empty. \"\"\" return self . length == 0 load ( content ) Load the data from the memory. Parameters: content ( List [ MemoryItem ] ) \u2013 The content to load. Source code in agents/memory/memory.py 99 100 101 102 103 104 def load ( self , content : List [ MemoryItem ]) -> None : \"\"\" Load the data from the memory. :param content: The content to load. \"\"\" self . _content = content to_json () Convert the memory to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 150 151 152 153 154 155 156 157 158 def to_json ( self ) -> str : \"\"\" Convert the memory to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( [ item . to_dict () for item in self . _content if item is not None ] ) Info Each agent has its own Memory instance to store their information. Info Not all information in the Memory are provided to the agent for decision-making. The agent can access parts of the memory based on the requirements of the agent's logic.","title":"Memory"},{"location":"agents/design/memory/#agent-memory","text":"The Memory manages the memory of the agent and stores the information required for the agent to interact with the user and applications at every step. Parts of elements in the Memory will be visible to the agent for decision-making.","title":"Agent Memory"},{"location":"agents/design/memory/#memoryitem","text":"A MemoryItem is a dataclass that represents a single step in the agent's memory. The fields of a MemoryItem is flexible and can be customized based on the requirements of the agent. The MemoryItem class is defined as follows: This data class represents a memory item of an agent at one step.","title":"MemoryItem"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.attributes","text":"Get the attributes of the memory item. Returns: List [ str ] \u2013 The attributes.","title":"attributes"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.add_values_from_dict","text":"Add fields to the memory item. Parameters: values ( Dict [ str , Any ] ) \u2013 The values of the fields. Source code in agents/memory/memory.py 57 58 59 60 61 62 63 def add_values_from_dict ( self , values : Dict [ str , Any ]) -> None : \"\"\" Add fields to the memory item. :param values: The values of the fields. \"\"\" for key , value in values . items (): self . set_value ( key , value )","title":"add_values_from_dict"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.filter","text":"Fetch the memory item. Parameters: keys ( List [ str ] , default: [] ) \u2013 The keys to fetch. Returns: None \u2013 The filtered memory item. Source code in agents/memory/memory.py 37 38 39 40 41 42 43 44 def filter ( self , keys : List [ str ] = []) -> None : \"\"\" Fetch the memory item. :param keys: The keys to fetch. :return: The filtered memory item. \"\"\" return { key : value for key , value in self . to_dict () . items () if key in keys }","title":"filter"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.get_value","text":"Get the value of the field. Parameters: key ( str ) \u2013 The key of the field. Returns: Optional [ str ] \u2013 The value of the field. Source code in agents/memory/memory.py 65 66 67 68 69 70 71 72 def get_value ( self , key : str ) -> Optional [ str ]: \"\"\" Get the value of the field. :param key: The key of the field. :return: The value of the field. \"\"\" return getattr ( self , key , None )","title":"get_value"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.get_values","text":"Get the values of the fields. Parameters: keys ( List [ str ] ) \u2013 The keys of the fields. Returns: dict \u2013 The values of the fields. Source code in agents/memory/memory.py 74 75 76 77 78 79 80 def get_values ( self , keys : List [ str ]) -> dict : \"\"\" Get the values of the fields. :param keys: The keys of the fields. :return: The values of the fields. \"\"\" return { key : self . get_value ( key ) for key in keys }","title":"get_values"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.set_value","text":"Add a field to the memory item. Parameters: key ( str ) \u2013 The key of the field. value ( str ) \u2013 The value of the field. Source code in agents/memory/memory.py 46 47 48 49 50 51 52 53 54 55 def set_value ( self , key : str , value : str ) -> None : \"\"\" Add a field to the memory item. :param key: The key of the field. :param value: The value of the field. \"\"\" setattr ( self , key , value ) if key not in self . _memory_attributes : self . _memory_attributes . append ( key )","title":"set_value"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.to_dict","text":"Convert the MemoryItem to a dictionary. Returns: Dict [ str , str ] \u2013 The dictionary. Source code in agents/memory/memory.py 19 20 21 22 23 24 25 26 27 28 def to_dict ( self ) -> Dict [ str , str ]: \"\"\" Convert the MemoryItem to a dictionary. :return: The dictionary. \"\"\" return { key : value for key , value in self . __dict__ . items () if key in self . _memory_attributes }","title":"to_dict"},{"location":"agents/design/memory/#agents.memory.memory.MemoryItem.to_json","text":"Convert the memory item to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 30 31 32 33 34 35 def to_json ( self ) -> str : \"\"\" Convert the memory item to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( self . to_dict ()) Info At each step, an instance of MemoryItem is created and stored in the Memory to record the information of the agent's interaction with the user and applications.","title":"to_json"},{"location":"agents/design/memory/#memory","text":"The Memory class is responsible for managing the memory of the agent. It stores a list of MemoryItem instances that represent the agent's memory at each step. The Memory class is defined as follows: This data class represents a memory of an agent.","title":"Memory"},{"location":"agents/design/memory/#agents.memory.memory.Memory.content","text":"Get the content of the memory. Returns: List [ MemoryItem ] \u2013 The content of the memory.","title":"content"},{"location":"agents/design/memory/#agents.memory.memory.Memory.length","text":"Get the length of the memory. Returns: int \u2013 The length of the memory.","title":"length"},{"location":"agents/design/memory/#agents.memory.memory.Memory.list_content","text":"List the content of the memory. Returns: List [ Dict [ str , str ]] \u2013 The content of the memory.","title":"list_content"},{"location":"agents/design/memory/#agents.memory.memory.Memory.add_memory_item","text":"Add a memory item to the memory. Parameters: memory_item ( MemoryItem ) \u2013 The memory item to add. Source code in agents/memory/memory.py 122 123 124 125 126 127 def add_memory_item ( self , memory_item : MemoryItem ) -> None : \"\"\" Add a memory item to the memory. :param memory_item: The memory item to add. \"\"\" self . _content . append ( memory_item )","title":"add_memory_item"},{"location":"agents/design/memory/#agents.memory.memory.Memory.clear","text":"Clear the memory. Source code in agents/memory/memory.py 129 130 131 132 133 def clear ( self ) -> None : \"\"\" Clear the memory. \"\"\" self . _content = []","title":"clear"},{"location":"agents/design/memory/#agents.memory.memory.Memory.delete_memory_item","text":"Delete a memory item from the memory. Parameters: step ( int ) \u2013 The step of the memory item to delete. Source code in agents/memory/memory.py 143 144 145 146 147 148 def delete_memory_item ( self , step : int ) -> None : \"\"\" Delete a memory item from the memory. :param step: The step of the memory item to delete. \"\"\" self . _content = [ item for item in self . _content if item . step != step ]","title":"delete_memory_item"},{"location":"agents/design/memory/#agents.memory.memory.Memory.filter_memory_from_keys","text":"Filter the memory from the keys. If an item does not have the key, the key will be ignored. Parameters: keys ( List [ str ] ) \u2013 The keys to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 114 115 116 117 118 119 120 def filter_memory_from_keys ( self , keys : List [ str ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the keys. If an item does not have the key, the key will be ignored. :param keys: The keys to filter. :return: The filtered memory. \"\"\" return [ item . filter ( keys ) for item in self . _content ]","title":"filter_memory_from_keys"},{"location":"agents/design/memory/#agents.memory.memory.Memory.filter_memory_from_steps","text":"Filter the memory from the steps. Parameters: steps ( List [ int ] ) \u2013 The steps to filter. Returns: List [ Dict [ str , str ]] \u2013 The filtered memory. Source code in agents/memory/memory.py 106 107 108 109 110 111 112 def filter_memory_from_steps ( self , steps : List [ int ]) -> List [ Dict [ str , str ]]: \"\"\" Filter the memory from the steps. :param steps: The steps to filter. :return: The filtered memory. \"\"\" return [ item . to_dict () for item in self . _content if item . step in steps ]","title":"filter_memory_from_steps"},{"location":"agents/design/memory/#agents.memory.memory.Memory.get_latest_item","text":"Get the latest memory item. Returns: MemoryItem \u2013 The latest memory item. Source code in agents/memory/memory.py 160 161 162 163 164 165 166 167 def get_latest_item ( self ) -> MemoryItem : \"\"\" Get the latest memory item. :return: The latest memory item. \"\"\" if self . length == 0 : return None return self . _content [ - 1 ]","title":"get_latest_item"},{"location":"agents/design/memory/#agents.memory.memory.Memory.is_empty","text":"Check if the memory is empty. Returns: bool \u2013 The boolean value indicating if the memory is empty. Source code in agents/memory/memory.py 185 186 187 188 189 190 def is_empty ( self ) -> bool : \"\"\" Check if the memory is empty. :return: The boolean value indicating if the memory is empty. \"\"\" return self . length == 0","title":"is_empty"},{"location":"agents/design/memory/#agents.memory.memory.Memory.load","text":"Load the data from the memory. Parameters: content ( List [ MemoryItem ] ) \u2013 The content to load. Source code in agents/memory/memory.py 99 100 101 102 103 104 def load ( self , content : List [ MemoryItem ]) -> None : \"\"\" Load the data from the memory. :param content: The content to load. \"\"\" self . _content = content","title":"load"},{"location":"agents/design/memory/#agents.memory.memory.Memory.to_json","text":"Convert the memory to a JSON string. Returns: str \u2013 The JSON string. Source code in agents/memory/memory.py 150 151 152 153 154 155 156 157 158 def to_json ( self ) -> str : \"\"\" Convert the memory to a JSON string. :return: The JSON string. \"\"\" return json . dumps ( [ item . to_dict () for item in self . _content if item is not None ] ) Info Each agent has its own Memory instance to store their information. Info Not all information in the Memory are provided to the agent for decision-making. The agent can access parts of the memory based on the requirements of the agent's logic.","title":"to_json"},{"location":"agents/design/processor/","text":"Agents Processor The Processor is a key component of the agent to process the core logic of the agent to process the user's request. The Processor is implemented as a class in the ufo/agents/processors folder. Each agent has its own Processor class withing the folder. Core Process Once called, an agent follows a series of steps to process the user's request defined in the Processor class by calling the process method. The workflow of the process is as follows: Step Description Function 1 Print the step information. print_step_info 2 Capture the screenshot of the application. capture_screenshot 3 Get the control information of the application. get_control_info 4 Get the prompt message for the LLM. get_prompt_message 5 Generate the response from the LLM. get_response 6 Update the cost of the step. update_cost 7 Parse the response from the LLM. parse_response 8 Execute the action based on the response. execute_action 9 Update the memory and blackboard. update_memory 10 Update the status of the agent. update_status At each step, the Processor processes the user's request by invoking the corresponding method sequentially to execute the necessary actions. The process may be paused. It can be resumed, based on the agent's logic and the user's request using the resume method. Reference Below is the basic structure of the Processor class: Bases: ABC The base processor for the session. A session consists of multiple rounds of conversation with the user, completing a task. At each round, the HostAgent and AppAgent interact with the user and the application with the processor. Each processor is responsible for processing the user request and updating the HostAgent and AppAgent at a single step in a round. Initialize the processor. Parameters: context ( Context ) \u2013 The context of the session. agent ( BasicAgent ) \u2013 The agent who executes the processor. Source code in agents/processors/basic.py 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def __init__ ( self , agent : BasicAgent , context : Context ) -> None : \"\"\" Initialize the processor. :param context: The context of the session. :param agent: The agent who executes the processor. \"\"\" self . _context = context self . _agent = agent self . photographer = PhotographerFacade () self . control_inspector = ControlInspectorFacade ( BACKEND ) self . _prompt_message = None self . _status = None self . _response = None self . _cost = 0 self . _control_label = None self . _control_text = None self . _response_json = {} self . _memory_data = MemoryItem () self . _results = None self . _question_list = [] self . _agent_status_manager = self . agent . status_manager self . _is_resumed = False self . _action = None self . _plan = None self . _control_log = { \"control_class\" : None , \"control_type\" : None , \"control_automation_id\" : None , } self . _total_time_cost = 0 self . _time_cost = {} self . _exeception_traceback = {} action : str property writable Get the action. Returns: str \u2013 The action. agent : BasicAgent property Get the agent. Returns: BasicAgent \u2013 The agent. app_root : str property writable Get the application root. Returns: str \u2013 The application root. application_process_name : str property writable Get the application process name. Returns: str \u2013 The application process name. application_window : UIAWrapper property writable Get the active window. Returns: UIAWrapper \u2013 The active window. context : Context property Get the context. Returns: Context \u2013 The context. control_label : str property writable Get the control label. Returns: str \u2013 The control label. control_reannotate : List [ str ] property writable Get the control reannotation. Returns: List [ str ] \u2013 The control reannotation. control_text : str property writable Get the active application. Returns: str \u2013 The active application. cost : float property writable Get the cost of the processor. Returns: float \u2013 The cost of the processor. host_message : List [ str ] property writable Get the host message. Returns: List [ str ] \u2013 The host message. log_path : str property Get the log path. Returns: str \u2013 The log path. logger : str property Get the logger. Returns: str \u2013 The logger. name : str property Get the name of the processor. Returns: str \u2013 The name of the processor. plan : str property writable Get the plan of the agent. Returns: str \u2013 The plan. prev_plan : List [ str ] property Get the previous plan. Returns: List [ str ] \u2013 The previous plan of the agent. previous_subtasks : List [ str ] property writable Get the previous subtasks. Returns: List [ str ] \u2013 The previous subtasks. question_list : List [ str ] property writable Get the question list. Returns: List [ str ] \u2013 The question list. request : str property Get the request. Returns: str \u2013 The request. request_logger : str property Get the request logger. Returns: str \u2013 The request logger. round_cost : float property writable Get the round cost. Returns: float \u2013 The round cost. round_num : int property Get the round number. Returns: int \u2013 The round number. round_step : int property writable Get the round step. Returns: int \u2013 The round step. round_subtask_amount : int property Get the round subtask amount. Returns: int \u2013 The round subtask amount. session_cost : float property writable Get the session cost. Returns: float \u2013 The session cost. session_step : int property writable Get the session step. Returns: int \u2013 The session step. status : str property writable Get the status of the processor. Returns: str \u2013 The status of the processor. subtask : str property writable Get the subtask. Returns: str \u2013 The subtask. ui_tree_path : str property Get the UI tree path. Returns: str \u2013 The UI tree path. add_to_memory ( data_dict ) Add the data to the memory. Parameters: data_dict ( Dict [ str , Any ] ) \u2013 The data dictionary to be added to the memory. Source code in agents/processors/basic.py 297 298 299 300 301 302 def add_to_memory ( self , data_dict : Dict [ str , Any ]) -> None : \"\"\" Add the data to the memory. :param data_dict: The data dictionary to be added to the memory. \"\"\" self . _memory_data . add_values_from_dict ( data_dict ) capture_screenshot () abstractmethod Capture the screenshot. Source code in agents/processors/basic.py 235 236 237 238 239 240 @abstractmethod def capture_screenshot ( self ) -> None : \"\"\" Capture the screenshot. \"\"\" pass exception_capture ( func ) classmethod Decorator to capture the exception of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 @classmethod def exception_capture ( cls , func ): \"\"\" Decorator to capture the exception of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): try : func ( self , * args , ** kwargs ) except Exception as e : self . _exeception_traceback [ func . __name__ ] = { \"type\" : str ( type ( e ) . __name__ ), \"message\" : str ( e ), \"traceback\" : traceback . format_exc (), } utils . print_with_color ( f \"Error Occurs at { func . __name__ } \" , \"red\" ) utils . print_with_color ( self . _exeception_traceback [ func . __name__ ][ \"traceback\" ], \"red\" ) if self . _response is not None : utils . print_with_color ( \"Response: \" , \"red\" ) utils . print_with_color ( self . _response , \"red\" ) self . _status = self . _agent_status_manager . ERROR . value self . sync_memory () self . add_to_memory ({ \"error\" : self . _exeception_traceback }) self . add_to_memory ({ \"Status\" : self . _status }) self . log_save () raise StopIteration ( \"Error occurred during step.\" ) return wrapper execute_action () abstractmethod Execute the action. Source code in agents/processors/basic.py 270 271 272 273 274 275 @abstractmethod def execute_action ( self ) -> None : \"\"\" Execute the action. \"\"\" pass get_control_info () abstractmethod Get the control information. Source code in agents/processors/basic.py 242 243 244 245 246 247 @abstractmethod def get_control_info ( self ) -> None : \"\"\" Get the control information. \"\"\" pass get_prompt_message () abstractmethod Get the prompt message. Source code in agents/processors/basic.py 249 250 251 252 253 254 @abstractmethod def get_prompt_message ( self ) -> None : \"\"\" Get the prompt message. \"\"\" pass get_response () abstractmethod Get the response from the LLM. Source code in agents/processors/basic.py 256 257 258 259 260 261 @abstractmethod def get_response ( self ) -> None : \"\"\" Get the response from the LLM. \"\"\" pass is_confirm () Check if the process is confirm. Returns: bool \u2013 The boolean value indicating if the process is confirm. Source code in agents/processors/basic.py 736 737 738 739 740 741 742 743 744 def is_confirm ( self ) -> bool : \"\"\" Check if the process is confirm. :return: The boolean value indicating if the process is confirm. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . CONFIRM . value is_error () Check if the process is in error. Returns: bool \u2013 The boolean value indicating if the process is in error. Source code in agents/processors/basic.py 704 705 706 707 708 709 710 711 def is_error ( self ) -> bool : \"\"\" Check if the process is in error. :return: The boolean value indicating if the process is in error. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . ERROR . value is_paused () Check if the process is paused. Returns: bool \u2013 The boolean value indicating if the process is paused. Source code in agents/processors/basic.py 713 714 715 716 717 718 719 720 721 722 723 724 def is_paused ( self ) -> bool : \"\"\" Check if the process is paused. :return: The boolean value indicating if the process is paused. \"\"\" self . agent . status = self . status return ( self . status == self . _agent_status_manager . PENDING . value or self . status == self . _agent_status_manager . CONFIRM . value ) is_pending () Check if the process is pending. Returns: bool \u2013 The boolean value indicating if the process is pending. Source code in agents/processors/basic.py 726 727 728 729 730 731 732 733 734 def is_pending ( self ) -> bool : \"\"\" Check if the process is pending. :return: The boolean value indicating if the process is pending. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . PENDING . value log ( response_json ) Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. Source code in agents/processors/basic.py 746 747 748 749 750 751 752 753 754 def log ( self , response_json : Dict [ str , Any ]) -> None : \"\"\" Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. \"\"\" self . logger . info ( json . dumps ( response_json )) log_save () Save the log. Source code in agents/processors/basic.py 304 305 306 307 308 309 310 311 312 def log_save ( self ) -> None : \"\"\" Save the log. \"\"\" self . _memory_data . add_values_from_dict ( { \"total_time_cost\" : self . _total_time_cost } ) self . log ( self . _memory_data . to_dict ()) method_timer ( func ) classmethod Decorator to calculate the time cost of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 @classmethod def method_timer ( cls , func ): \"\"\" Decorator to calculate the time cost of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): start_time = time . time () result = func ( self , * args , ** kwargs ) end_time = time . time () self . _time_cost [ func . __name__ ] = end_time - start_time return result return wrapper parse_response () abstractmethod Parse the response. Source code in agents/processors/basic.py 263 264 265 266 267 268 @abstractmethod def parse_response ( self ) -> None : \"\"\" Parse the response. \"\"\" pass print_step_info () abstractmethod Print the step information. Source code in agents/processors/basic.py 228 229 230 231 232 233 @abstractmethod def print_step_info ( self ) -> None : \"\"\" Print the step information. \"\"\" pass process () Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. Source code in agents/processors/basic.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def process ( self ) -> None : \"\"\" Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. \"\"\" start_time = time . time () try : # Step 1: Print the step information. self . print_step_info () # Step 2: Capture the screenshot. self . capture_screenshot () # Step 3: Get the control information. self . get_control_info () # Step 4: Get the prompt message. self . get_prompt_message () # Step 5: Get the response. self . get_response () # Step 6: Update the context. self . update_cost () # Step 7: Parse the response, if there is no error. self . parse_response () if self . is_pending () or self . is_paused (): # If the session is pending, update the step and memory, and return. if self . is_pending (): self . update_status () self . update_memory () return # Step 8: Execute the action. self . execute_action () # Step 9: Update the memory. self . update_memory () # Step 10: Update the status. self . update_status () self . _total_time_cost = time . time () - start_time # Step 11: Save the log. self . log_save () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. return resume () Resume the process of action execution after the session is paused. Source code in agents/processors/basic.py 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 def resume ( self ) -> None : \"\"\" Resume the process of action execution after the session is paused. \"\"\" self . _is_resumed = True try : # Step 1: Execute the action. self . execute_action () # Step 2: Update the memory. self . update_memory () # Step 3: Update the status. self . update_status () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. pass finally : self . _is_resumed = False string2list ( string ) staticmethod Convert a string to a list of string if the input is a string. Parameters: string ( Any ) \u2013 The string. Returns: List [ str ] \u2013 The list. Source code in agents/processors/basic.py 764 765 766 767 768 769 770 771 772 773 774 @staticmethod def string2list ( string : Any ) -> List [ str ]: \"\"\" Convert a string to a list of string if the input is a string. :param string: The string. :return: The list. \"\"\" if isinstance ( string , str ): return [ string ] else : return string sync_memory () abstractmethod Sync the memory of the Agent. Source code in agents/processors/basic.py 221 222 223 224 225 226 @abstractmethod def sync_memory ( self ) -> None : \"\"\" Sync the memory of the Agent. \"\"\" pass update_cost () Update the cost. Source code in agents/processors/basic.py 322 323 324 325 326 327 328 def update_cost ( self ) -> None : \"\"\" Update the cost. \"\"\" self . round_cost += self . cost self . session_cost += self . cost update_memory () abstractmethod Update the memory of the Agent. Source code in agents/processors/basic.py 277 278 279 280 281 282 @abstractmethod def update_memory ( self ) -> None : \"\"\" Update the memory of the Agent. \"\"\" pass update_status () Update the status of the session. Source code in agents/processors/basic.py 284 285 286 287 288 289 290 291 292 293 294 295 def update_status ( self ) -> None : \"\"\" Update the status of the session. \"\"\" self . agent . step += 1 self . agent . status = self . status if self . status != self . _agent_status_manager . FINISH . value : time . sleep ( configs [ \"SLEEP_TIME\" ]) self . round_step += 1 self . session_step += 1","title":"Processor"},{"location":"agents/design/processor/#agents-processor","text":"The Processor is a key component of the agent to process the core logic of the agent to process the user's request. The Processor is implemented as a class in the ufo/agents/processors folder. Each agent has its own Processor class withing the folder.","title":"Agents Processor"},{"location":"agents/design/processor/#core-process","text":"Once called, an agent follows a series of steps to process the user's request defined in the Processor class by calling the process method. The workflow of the process is as follows: Step Description Function 1 Print the step information. print_step_info 2 Capture the screenshot of the application. capture_screenshot 3 Get the control information of the application. get_control_info 4 Get the prompt message for the LLM. get_prompt_message 5 Generate the response from the LLM. get_response 6 Update the cost of the step. update_cost 7 Parse the response from the LLM. parse_response 8 Execute the action based on the response. execute_action 9 Update the memory and blackboard. update_memory 10 Update the status of the agent. update_status At each step, the Processor processes the user's request by invoking the corresponding method sequentially to execute the necessary actions. The process may be paused. It can be resumed, based on the agent's logic and the user's request using the resume method.","title":"Core Process"},{"location":"agents/design/processor/#reference","text":"Below is the basic structure of the Processor class: Bases: ABC The base processor for the session. A session consists of multiple rounds of conversation with the user, completing a task. At each round, the HostAgent and AppAgent interact with the user and the application with the processor. Each processor is responsible for processing the user request and updating the HostAgent and AppAgent at a single step in a round. Initialize the processor. Parameters: context ( Context ) \u2013 The context of the session. agent ( BasicAgent ) \u2013 The agent who executes the processor. Source code in agents/processors/basic.py 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def __init__ ( self , agent : BasicAgent , context : Context ) -> None : \"\"\" Initialize the processor. :param context: The context of the session. :param agent: The agent who executes the processor. \"\"\" self . _context = context self . _agent = agent self . photographer = PhotographerFacade () self . control_inspector = ControlInspectorFacade ( BACKEND ) self . _prompt_message = None self . _status = None self . _response = None self . _cost = 0 self . _control_label = None self . _control_text = None self . _response_json = {} self . _memory_data = MemoryItem () self . _results = None self . _question_list = [] self . _agent_status_manager = self . agent . status_manager self . _is_resumed = False self . _action = None self . _plan = None self . _control_log = { \"control_class\" : None , \"control_type\" : None , \"control_automation_id\" : None , } self . _total_time_cost = 0 self . _time_cost = {} self . _exeception_traceback = {}","title":"Reference"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.action","text":"Get the action. Returns: str \u2013 The action.","title":"action"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.agent","text":"Get the agent. Returns: BasicAgent \u2013 The agent.","title":"agent"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.app_root","text":"Get the application root. Returns: str \u2013 The application root.","title":"app_root"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.application_process_name","text":"Get the application process name. Returns: str \u2013 The application process name.","title":"application_process_name"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.application_window","text":"Get the active window. Returns: UIAWrapper \u2013 The active window.","title":"application_window"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.context","text":"Get the context. Returns: Context \u2013 The context.","title":"context"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.control_label","text":"Get the control label. Returns: str \u2013 The control label.","title":"control_label"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.control_reannotate","text":"Get the control reannotation. Returns: List [ str ] \u2013 The control reannotation.","title":"control_reannotate"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.control_text","text":"Get the active application. Returns: str \u2013 The active application.","title":"control_text"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.cost","text":"Get the cost of the processor. Returns: float \u2013 The cost of the processor.","title":"cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.host_message","text":"Get the host message. Returns: List [ str ] \u2013 The host message.","title":"host_message"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.log_path","text":"Get the log path. Returns: str \u2013 The log path.","title":"log_path"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.logger","text":"Get the logger. Returns: str \u2013 The logger.","title":"logger"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.name","text":"Get the name of the processor. Returns: str \u2013 The name of the processor.","title":"name"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.plan","text":"Get the plan of the agent. Returns: str \u2013 The plan.","title":"plan"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.prev_plan","text":"Get the previous plan. Returns: List [ str ] \u2013 The previous plan of the agent.","title":"prev_plan"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.previous_subtasks","text":"Get the previous subtasks. Returns: List [ str ] \u2013 The previous subtasks.","title":"previous_subtasks"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.question_list","text":"Get the question list. Returns: List [ str ] \u2013 The question list.","title":"question_list"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.request","text":"Get the request. Returns: str \u2013 The request.","title":"request"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.request_logger","text":"Get the request logger. Returns: str \u2013 The request logger.","title":"request_logger"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_cost","text":"Get the round cost. Returns: float \u2013 The round cost.","title":"round_cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_num","text":"Get the round number. Returns: int \u2013 The round number.","title":"round_num"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_step","text":"Get the round step. Returns: int \u2013 The round step.","title":"round_step"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.round_subtask_amount","text":"Get the round subtask amount. Returns: int \u2013 The round subtask amount.","title":"round_subtask_amount"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.session_cost","text":"Get the session cost. Returns: float \u2013 The session cost.","title":"session_cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.session_step","text":"Get the session step. Returns: int \u2013 The session step.","title":"session_step"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.status","text":"Get the status of the processor. Returns: str \u2013 The status of the processor.","title":"status"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.subtask","text":"Get the subtask. Returns: str \u2013 The subtask.","title":"subtask"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.ui_tree_path","text":"Get the UI tree path. Returns: str \u2013 The UI tree path.","title":"ui_tree_path"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.add_to_memory","text":"Add the data to the memory. Parameters: data_dict ( Dict [ str , Any ] ) \u2013 The data dictionary to be added to the memory. Source code in agents/processors/basic.py 297 298 299 300 301 302 def add_to_memory ( self , data_dict : Dict [ str , Any ]) -> None : \"\"\" Add the data to the memory. :param data_dict: The data dictionary to be added to the memory. \"\"\" self . _memory_data . add_values_from_dict ( data_dict )","title":"add_to_memory"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.capture_screenshot","text":"Capture the screenshot. Source code in agents/processors/basic.py 235 236 237 238 239 240 @abstractmethod def capture_screenshot ( self ) -> None : \"\"\" Capture the screenshot. \"\"\" pass","title":"capture_screenshot"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.exception_capture","text":"Decorator to capture the exception of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 @classmethod def exception_capture ( cls , func ): \"\"\" Decorator to capture the exception of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): try : func ( self , * args , ** kwargs ) except Exception as e : self . _exeception_traceback [ func . __name__ ] = { \"type\" : str ( type ( e ) . __name__ ), \"message\" : str ( e ), \"traceback\" : traceback . format_exc (), } utils . print_with_color ( f \"Error Occurs at { func . __name__ } \" , \"red\" ) utils . print_with_color ( self . _exeception_traceback [ func . __name__ ][ \"traceback\" ], \"red\" ) if self . _response is not None : utils . print_with_color ( \"Response: \" , \"red\" ) utils . print_with_color ( self . _response , \"red\" ) self . _status = self . _agent_status_manager . ERROR . value self . sync_memory () self . add_to_memory ({ \"error\" : self . _exeception_traceback }) self . add_to_memory ({ \"Status\" : self . _status }) self . log_save () raise StopIteration ( \"Error occurred during step.\" ) return wrapper","title":"exception_capture"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.execute_action","text":"Execute the action. Source code in agents/processors/basic.py 270 271 272 273 274 275 @abstractmethod def execute_action ( self ) -> None : \"\"\" Execute the action. \"\"\" pass","title":"execute_action"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.get_control_info","text":"Get the control information. Source code in agents/processors/basic.py 242 243 244 245 246 247 @abstractmethod def get_control_info ( self ) -> None : \"\"\" Get the control information. \"\"\" pass","title":"get_control_info"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.get_prompt_message","text":"Get the prompt message. Source code in agents/processors/basic.py 249 250 251 252 253 254 @abstractmethod def get_prompt_message ( self ) -> None : \"\"\" Get the prompt message. \"\"\" pass","title":"get_prompt_message"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.get_response","text":"Get the response from the LLM. Source code in agents/processors/basic.py 256 257 258 259 260 261 @abstractmethod def get_response ( self ) -> None : \"\"\" Get the response from the LLM. \"\"\" pass","title":"get_response"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_confirm","text":"Check if the process is confirm. Returns: bool \u2013 The boolean value indicating if the process is confirm. Source code in agents/processors/basic.py 736 737 738 739 740 741 742 743 744 def is_confirm ( self ) -> bool : \"\"\" Check if the process is confirm. :return: The boolean value indicating if the process is confirm. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . CONFIRM . value","title":"is_confirm"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_error","text":"Check if the process is in error. Returns: bool \u2013 The boolean value indicating if the process is in error. Source code in agents/processors/basic.py 704 705 706 707 708 709 710 711 def is_error ( self ) -> bool : \"\"\" Check if the process is in error. :return: The boolean value indicating if the process is in error. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . ERROR . value","title":"is_error"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_paused","text":"Check if the process is paused. Returns: bool \u2013 The boolean value indicating if the process is paused. Source code in agents/processors/basic.py 713 714 715 716 717 718 719 720 721 722 723 724 def is_paused ( self ) -> bool : \"\"\" Check if the process is paused. :return: The boolean value indicating if the process is paused. \"\"\" self . agent . status = self . status return ( self . status == self . _agent_status_manager . PENDING . value or self . status == self . _agent_status_manager . CONFIRM . value )","title":"is_paused"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.is_pending","text":"Check if the process is pending. Returns: bool \u2013 The boolean value indicating if the process is pending. Source code in agents/processors/basic.py 726 727 728 729 730 731 732 733 734 def is_pending ( self ) -> bool : \"\"\" Check if the process is pending. :return: The boolean value indicating if the process is pending. \"\"\" self . agent . status = self . status return self . status == self . _agent_status_manager . PENDING . value","title":"is_pending"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.log","text":"Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. Source code in agents/processors/basic.py 746 747 748 749 750 751 752 753 754 def log ( self , response_json : Dict [ str , Any ]) -> None : \"\"\" Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json. \"\"\" self . logger . info ( json . dumps ( response_json ))","title":"log"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.log_save","text":"Save the log. Source code in agents/processors/basic.py 304 305 306 307 308 309 310 311 312 def log_save ( self ) -> None : \"\"\" Save the log. \"\"\" self . _memory_data . add_values_from_dict ( { \"total_time_cost\" : self . _total_time_cost } ) self . log ( self . _memory_data . to_dict ())","title":"log_save"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.method_timer","text":"Decorator to calculate the time cost of the method. Parameters: func \u2013 The method to be decorated. Returns: \u2013 The decorated method. Source code in agents/processors/basic.py 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 @classmethod def method_timer ( cls , func ): \"\"\" Decorator to calculate the time cost of the method. :param func: The method to be decorated. :return: The decorated method. \"\"\" @wraps ( func ) def wrapper ( self , * args , ** kwargs ): start_time = time . time () result = func ( self , * args , ** kwargs ) end_time = time . time () self . _time_cost [ func . __name__ ] = end_time - start_time return result return wrapper","title":"method_timer"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.parse_response","text":"Parse the response. Source code in agents/processors/basic.py 263 264 265 266 267 268 @abstractmethod def parse_response ( self ) -> None : \"\"\" Parse the response. \"\"\" pass","title":"parse_response"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.print_step_info","text":"Print the step information. Source code in agents/processors/basic.py 228 229 230 231 232 233 @abstractmethod def print_step_info ( self ) -> None : \"\"\" Print the step information. \"\"\" pass","title":"print_step_info"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.process","text":"Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. Source code in agents/processors/basic.py 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 def process ( self ) -> None : \"\"\" Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log. \"\"\" start_time = time . time () try : # Step 1: Print the step information. self . print_step_info () # Step 2: Capture the screenshot. self . capture_screenshot () # Step 3: Get the control information. self . get_control_info () # Step 4: Get the prompt message. self . get_prompt_message () # Step 5: Get the response. self . get_response () # Step 6: Update the context. self . update_cost () # Step 7: Parse the response, if there is no error. self . parse_response () if self . is_pending () or self . is_paused (): # If the session is pending, update the step and memory, and return. if self . is_pending (): self . update_status () self . update_memory () return # Step 8: Execute the action. self . execute_action () # Step 9: Update the memory. self . update_memory () # Step 10: Update the status. self . update_status () self . _total_time_cost = time . time () - start_time # Step 11: Save the log. self . log_save () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. return","title":"process"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.resume","text":"Resume the process of action execution after the session is paused. Source code in agents/processors/basic.py 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 def resume ( self ) -> None : \"\"\" Resume the process of action execution after the session is paused. \"\"\" self . _is_resumed = True try : # Step 1: Execute the action. self . execute_action () # Step 2: Update the memory. self . update_memory () # Step 3: Update the status. self . update_status () except StopIteration : # Error was handled and logged in the exception capture decorator. # Simply return here to stop the process early. pass finally : self . _is_resumed = False","title":"resume"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.string2list","text":"Convert a string to a list of string if the input is a string. Parameters: string ( Any ) \u2013 The string. Returns: List [ str ] \u2013 The list. Source code in agents/processors/basic.py 764 765 766 767 768 769 770 771 772 773 774 @staticmethod def string2list ( string : Any ) -> List [ str ]: \"\"\" Convert a string to a list of string if the input is a string. :param string: The string. :return: The list. \"\"\" if isinstance ( string , str ): return [ string ] else : return string","title":"string2list"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.sync_memory","text":"Sync the memory of the Agent. Source code in agents/processors/basic.py 221 222 223 224 225 226 @abstractmethod def sync_memory ( self ) -> None : \"\"\" Sync the memory of the Agent. \"\"\" pass","title":"sync_memory"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.update_cost","text":"Update the cost. Source code in agents/processors/basic.py 322 323 324 325 326 327 328 def update_cost ( self ) -> None : \"\"\" Update the cost. \"\"\" self . round_cost += self . cost self . session_cost += self . cost","title":"update_cost"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.update_memory","text":"Update the memory of the Agent. Source code in agents/processors/basic.py 277 278 279 280 281 282 @abstractmethod def update_memory ( self ) -> None : \"\"\" Update the memory of the Agent. \"\"\" pass","title":"update_memory"},{"location":"agents/design/processor/#agents.processors.basic.BaseProcessor.update_status","text":"Update the status of the session. Source code in agents/processors/basic.py 284 285 286 287 288 289 290 291 292 293 294 295 def update_status ( self ) -> None : \"\"\" Update the status of the session. \"\"\" self . agent . step += 1 self . agent . status = self . status if self . status != self . _agent_status_manager . FINISH . value : time . sleep ( configs [ \"SLEEP_TIME\" ]) self . round_step += 1 self . session_step += 1","title":"update_status"},{"location":"agents/design/prompter/","text":"Agent Prompter The Prompter is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. The Prompter is implemented in the ufo/prompts folder. Each agent has its own Prompter class that defines the structure of the prompt and the information to be fed to the LLM. Components A prompt fed to the LLM usually a list of dictionaries, where each dictionary contains the following keys: Key Description role The role of the text in the prompt, can be system , user , or assistant . content The content of the text for the specific role. Tip You may find the official documentation helpful for constructing the prompt. In the __init__ method of the Prompter class, you can define the template of the prompt for each component, and the final prompt message is constructed by combining the templates of each component using the prompt_construction method. System Prompt The system prompt use the template configured in the config_dev.yaml file for each agent. It usually contains the instructions for the agent's role, action, tips, reponse format, etc. You need use the system_prompt_construction method to construct the system prompt. Prompts on the API instructions, and demonstration examples are also included in the system prompt, which are constructed by the api_prompt_helper and examples_prompt_helper methods respectively. Below is the sub-components of the system prompt: Component Description Method apis The API instructions for the agent. api_prompt_helper examples The demonstration examples for the agent. examples_prompt_helper User Prompt The user prompt is constructed based on the information from the agent's observation, external knowledge, and Blackboard . You can use the user_prompt_construction method to construct the user prompt. Below is the sub-components of the user prompt: Component Description Method observation The observation of the agent. user_content_construction retrieved_docs The knowledge retrieved from the external knowledge base. retrived_documents_prompt_helper blackboard The information stored in the Blackboard . blackboard_to_prompt Reference You can find the implementation of the Prompter in the ufo/prompts folder. Below is the basic structure of the Prompter class: Bases: ABC The BasicPrompter class is the abstract class for the prompter. Initialize the BasicPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. Source code in prompter/basic.py 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str ): \"\"\" Initialize the BasicPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. \"\"\" self . is_visual = is_visual if prompt_template : self . prompt_template = self . load_prompt_template ( prompt_template , is_visual ) else : self . prompt_template = \"\" if example_prompt_template : self . example_prompt_template = self . load_prompt_template ( example_prompt_template , is_visual ) else : self . example_prompt_template = \"\" api_prompt_helper () A helper function to construct the API list and descriptions for the prompt. Source code in prompter/basic.py 139 140 141 142 143 144 def api_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the API list and descriptions for the prompt. \"\"\" pass examples_prompt_helper () A helper function to construct the examples prompt for in-context learning. Source code in prompter/basic.py 132 133 134 135 136 137 def examples_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the examples prompt for in-context learning. \"\"\" pass load_prompt_template ( template_path , is_visual = None ) staticmethod Load the prompt template. Returns: Dict [ str , str ] \u2013 The prompt template. Source code in prompter/basic.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 @staticmethod def load_prompt_template ( template_path : str , is_visual = None ) -> Dict [ str , str ]: \"\"\" Load the prompt template. :return: The prompt template. \"\"\" if is_visual == None : path = template_path else : path = template_path . format ( mode = \"visual\" if is_visual == True else \"nonvisual\" ) if not path : return {} if os . path . exists ( path ): try : prompt = yaml . safe_load ( open ( path , \"r\" , encoding = \"utf-8\" )) except yaml . YAMLError as exc : print_with_color ( f \"Error loading prompt template: { exc } \" , \"yellow\" ) else : raise FileNotFoundError ( f \"Prompt template not found at { path } \" ) return prompt prompt_construction ( system_prompt , user_content ) staticmethod Construct the prompt for summarizing the experience into an example. Parameters: user_content ( List [ Dict [ str , str ]] ) \u2013 The user content. return: The prompt for summarizing the experience into an example. Source code in prompter/basic.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 @staticmethod def prompt_construction ( system_prompt : str , user_content : List [ Dict [ str , str ]] ) -> List : \"\"\" Construct the prompt for summarizing the experience into an example. :param user_content: The user content. return: The prompt for summarizing the experience into an example. \"\"\" system_message = { \"role\" : \"system\" , \"content\" : system_prompt } user_message = { \"role\" : \"user\" , \"content\" : user_content } prompt_message = [ system_message , user_message ] return prompt_message retrived_documents_prompt_helper ( header , separator , documents ) staticmethod Construct the prompt for retrieved documents. Parameters: header ( str ) \u2013 The header of the prompt. separator ( str ) \u2013 The separator of the prompt. documents ( List [ str ] ) \u2013 The retrieved documents. return: The prompt for retrieved documents. Source code in prompter/basic.py 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 @staticmethod def retrived_documents_prompt_helper ( header : str , separator : str , documents : List [ str ] ) -> str : \"\"\" Construct the prompt for retrieved documents. :param header: The header of the prompt. :param separator: The separator of the prompt. :param documents: The retrieved documents. return: The prompt for retrieved documents. \"\"\" if header : prompt = \" \\n < {header} :> \\n \" . format ( header = header ) else : prompt = \"\" for i , document in enumerate ( documents ): if separator : prompt += \"[ {separator} {i} :]\" . format ( separator = separator , i = i + 1 ) prompt += \" \\n \" prompt += document prompt += \" \\n\\n \" return prompt system_prompt_construction () abstractmethod Construct the system prompt for LLM. Source code in prompter/basic.py 108 109 110 111 112 113 114 @abstractmethod def system_prompt_construction ( self ) -> str : \"\"\" Construct the system prompt for LLM. \"\"\" pass user_content_construction () abstractmethod Construct the full user content for LLM, including the user prompt and images. Source code in prompter/basic.py 124 125 126 127 128 129 130 @abstractmethod def user_content_construction ( self ) -> str : \"\"\" Construct the full user content for LLM, including the user prompt and images. \"\"\" pass user_prompt_construction () abstractmethod Construct the textual user prompt for LLM based on the user field in the prompt template. Source code in prompter/basic.py 116 117 118 119 120 121 122 @abstractmethod def user_prompt_construction ( self ) -> str : \"\"\" Construct the textual user prompt for LLM based on the `user` field in the prompt template. \"\"\" pass Tip You can customize the Prompter class to tailor the prompt to your requirements.","title":"Prompter"},{"location":"agents/design/prompter/#agent-prompter","text":"The Prompter is a key component of the UFO framework, responsible for constructing prompts for the LLM to generate responses. The Prompter is implemented in the ufo/prompts folder. Each agent has its own Prompter class that defines the structure of the prompt and the information to be fed to the LLM.","title":"Agent Prompter"},{"location":"agents/design/prompter/#components","text":"A prompt fed to the LLM usually a list of dictionaries, where each dictionary contains the following keys: Key Description role The role of the text in the prompt, can be system , user , or assistant . content The content of the text for the specific role. Tip You may find the official documentation helpful for constructing the prompt. In the __init__ method of the Prompter class, you can define the template of the prompt for each component, and the final prompt message is constructed by combining the templates of each component using the prompt_construction method.","title":"Components"},{"location":"agents/design/prompter/#system-prompt","text":"The system prompt use the template configured in the config_dev.yaml file for each agent. It usually contains the instructions for the agent's role, action, tips, reponse format, etc. You need use the system_prompt_construction method to construct the system prompt. Prompts on the API instructions, and demonstration examples are also included in the system prompt, which are constructed by the api_prompt_helper and examples_prompt_helper methods respectively. Below is the sub-components of the system prompt: Component Description Method apis The API instructions for the agent. api_prompt_helper examples The demonstration examples for the agent. examples_prompt_helper","title":"System Prompt"},{"location":"agents/design/prompter/#user-prompt","text":"The user prompt is constructed based on the information from the agent's observation, external knowledge, and Blackboard . You can use the user_prompt_construction method to construct the user prompt. Below is the sub-components of the user prompt: Component Description Method observation The observation of the agent. user_content_construction retrieved_docs The knowledge retrieved from the external knowledge base. retrived_documents_prompt_helper blackboard The information stored in the Blackboard . blackboard_to_prompt","title":"User Prompt"},{"location":"agents/design/prompter/#reference","text":"You can find the implementation of the Prompter in the ufo/prompts folder. Below is the basic structure of the Prompter class: Bases: ABC The BasicPrompter class is the abstract class for the prompter. Initialize the BasicPrompter. Parameters: is_visual ( bool ) \u2013 Whether the request is for visual model. prompt_template ( str ) \u2013 The path of the prompt template. example_prompt_template ( str ) \u2013 The path of the example prompt template. Source code in prompter/basic.py 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 def __init__ ( self , is_visual : bool , prompt_template : str , example_prompt_template : str ): \"\"\" Initialize the BasicPrompter. :param is_visual: Whether the request is for visual model. :param prompt_template: The path of the prompt template. :param example_prompt_template: The path of the example prompt template. \"\"\" self . is_visual = is_visual if prompt_template : self . prompt_template = self . load_prompt_template ( prompt_template , is_visual ) else : self . prompt_template = \"\" if example_prompt_template : self . example_prompt_template = self . load_prompt_template ( example_prompt_template , is_visual ) else : self . example_prompt_template = \"\"","title":"Reference"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.api_prompt_helper","text":"A helper function to construct the API list and descriptions for the prompt. Source code in prompter/basic.py 139 140 141 142 143 144 def api_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the API list and descriptions for the prompt. \"\"\" pass","title":"api_prompt_helper"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.examples_prompt_helper","text":"A helper function to construct the examples prompt for in-context learning. Source code in prompter/basic.py 132 133 134 135 136 137 def examples_prompt_helper ( self ) -> str : \"\"\" A helper function to construct the examples prompt for in-context learning. \"\"\" pass","title":"examples_prompt_helper"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.load_prompt_template","text":"Load the prompt template. Returns: Dict [ str , str ] \u2013 The prompt template. Source code in prompter/basic.py 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 @staticmethod def load_prompt_template ( template_path : str , is_visual = None ) -> Dict [ str , str ]: \"\"\" Load the prompt template. :return: The prompt template. \"\"\" if is_visual == None : path = template_path else : path = template_path . format ( mode = \"visual\" if is_visual == True else \"nonvisual\" ) if not path : return {} if os . path . exists ( path ): try : prompt = yaml . safe_load ( open ( path , \"r\" , encoding = \"utf-8\" )) except yaml . YAMLError as exc : print_with_color ( f \"Error loading prompt template: { exc } \" , \"yellow\" ) else : raise FileNotFoundError ( f \"Prompt template not found at { path } \" ) return prompt","title":"load_prompt_template"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.prompt_construction","text":"Construct the prompt for summarizing the experience into an example. Parameters: user_content ( List [ Dict [ str , str ]] ) \u2013 The user content. return: The prompt for summarizing the experience into an example. Source code in prompter/basic.py 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 @staticmethod def prompt_construction ( system_prompt : str , user_content : List [ Dict [ str , str ]] ) -> List : \"\"\" Construct the prompt for summarizing the experience into an example. :param user_content: The user content. return: The prompt for summarizing the experience into an example. \"\"\" system_message = { \"role\" : \"system\" , \"content\" : system_prompt } user_message = { \"role\" : \"user\" , \"content\" : user_content } prompt_message = [ system_message , user_message ] return prompt_message","title":"prompt_construction"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.retrived_documents_prompt_helper","text":"Construct the prompt for retrieved documents. Parameters: header ( str ) \u2013 The header of the prompt. separator ( str ) \u2013 The separator of the prompt. documents ( List [ str ] ) \u2013 The retrieved documents. return: The prompt for retrieved documents. Source code in prompter/basic.py 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 @staticmethod def retrived_documents_prompt_helper ( header : str , separator : str , documents : List [ str ] ) -> str : \"\"\" Construct the prompt for retrieved documents. :param header: The header of the prompt. :param separator: The separator of the prompt. :param documents: The retrieved documents. return: The prompt for retrieved documents. \"\"\" if header : prompt = \" \\n < {header} :> \\n \" . format ( header = header ) else : prompt = \"\" for i , document in enumerate ( documents ): if separator : prompt += \"[ {separator} {i} :]\" . format ( separator = separator , i = i + 1 ) prompt += \" \\n \" prompt += document prompt += \" \\n\\n \" return prompt","title":"retrived_documents_prompt_helper"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.system_prompt_construction","text":"Construct the system prompt for LLM. Source code in prompter/basic.py 108 109 110 111 112 113 114 @abstractmethod def system_prompt_construction ( self ) -> str : \"\"\" Construct the system prompt for LLM. \"\"\" pass","title":"system_prompt_construction"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.user_content_construction","text":"Construct the full user content for LLM, including the user prompt and images. Source code in prompter/basic.py 124 125 126 127 128 129 130 @abstractmethod def user_content_construction ( self ) -> str : \"\"\" Construct the full user content for LLM, including the user prompt and images. \"\"\" pass","title":"user_content_construction"},{"location":"agents/design/prompter/#prompter.basic.BasicPrompter.user_prompt_construction","text":"Construct the textual user prompt for LLM based on the user field in the prompt template. Source code in prompter/basic.py 116 117 118 119 120 121 122 @abstractmethod def user_prompt_construction ( self ) -> str : \"\"\" Construct the textual user prompt for LLM based on the `user` field in the prompt template. \"\"\" pass Tip You can customize the Prompter class to tailor the prompt to your requirements.","title":"user_prompt_construction"},{"location":"agents/design/state/","text":"Agent State The State class is a fundamental component of the UFO agent framework. It represents the current state of the agent and determines the next action and agent to handle the request. Each agent has a specific set of states that define the agent's behavior and workflow. AgentStatus The set of states for an agent is defined in the AgentStatus class: class AgentStatus(Enum): \"\"\" The status class for the agent. \"\"\" ERROR = \"ERROR\" FINISH = \"FINISH\" CONTINUE = \"CONTINUE\" FAIL = \"FAIL\" PENDING = \"PENDING\" CONFIRM = \"CONFIRM\" SCREENSHOT = \"SCREENSHOT\" Each agent implements its own set of AgentStatus to define the states of the agent. AgentStateManager The class AgentStateManager manages the state mapping from a string to the corresponding state class. Each state class is registered with the AgentStateManager using the register decorator to associate the state class with a specific agent, e.g., @AgentStateManager.register class SomeAgentState(AgentState): \"\"\" The state class for the some agent. \"\"\" Tip You can find examples on how to register the state class for the AppAgent in the ufo/agents/states/app_agent_state.py file. Below is the basic structure of the AgentStateManager class: class AgentStateManager(ABC, metaclass=SingletonABCMeta): \"\"\" A abstract class to manage the states of the agent. \"\"\" _state_mapping: Dict[str, Type[AgentState]] = {} def __init__(self): \"\"\" Initialize the state manager. \"\"\" self._state_instance_mapping: Dict[str, AgentState] = {} def get_state(self, status: str) -> AgentState: \"\"\" Get the state for the status. :param status: The status string. :return: The state object. \"\"\" # Lazy load the state class if status not in self._state_instance_mapping: state_class = self._state_mapping.get(status) if state_class: self._state_instance_mapping[status] = state_class() else: self._state_instance_mapping[status] = self.none_state state = self._state_instance_mapping.get(status, self.none_state) return state def add_state(self, status: str, state: AgentState) -> None: \"\"\" Add a new state to the state mapping. :param status: The status string. :param state: The state object. \"\"\" self.state_map[status] = state @property def state_map(self) -> Dict[str, AgentState]: \"\"\" The state mapping of status to state. :return: The state mapping. \"\"\" return self._state_instance_mapping @classmethod def register(cls, state_class: Type[AgentState]) -> Type[AgentState]: \"\"\" Decorator to register the state class to the state manager. :param state_class: The state class to be registered. :return: The state class. \"\"\" cls._state_mapping[state_class.name()] = state_class return state_class @property @abstractmethod def none_state(self) -> AgentState: \"\"\" The none state of the state manager. \"\"\" pass AgentState Each state class inherits from the AgentState class and must implement the method of handle to process the action in the state. In addition, the next_state and next_agent methods are used to determine the next state and agent to handle the transition. Please find below the reference for the State class in UFO. Bases: ABC The abstract class for the agent state. agent_class () abstractmethod classmethod The class of the agent. Returns: Type [ BasicAgent ] \u2013 The class of the agent. Source code in agents/states/basic.py 165 166 167 168 169 170 171 172 @classmethod @abstractmethod def agent_class ( cls ) -> Type [ BasicAgent ]: \"\"\" The class of the agent. :return: The class of the agent. \"\"\" pass handle ( agent , context = None ) abstractmethod Handle the agent for the current step. Parameters: agent ( BasicAgent ) \u2013 The agent to handle. context ( Optional ['Context'] , default: None ) \u2013 The context for the agent and session. Source code in agents/states/basic.py 122 123 124 125 126 127 128 129 @abstractmethod def handle ( self , agent : BasicAgent , context : Optional [ \"Context\" ] = None ) -> None : \"\"\" Handle the agent for the current step. :param agent: The agent to handle. :param context: The context for the agent and session. \"\"\" pass is_round_end () abstractmethod Check if the round ends. Returns: bool \u2013 True if the round ends, False otherwise. Source code in agents/states/basic.py 149 150 151 152 153 154 155 @abstractmethod def is_round_end ( self ) -> bool : \"\"\" Check if the round ends. :return: True if the round ends, False otherwise. \"\"\" pass is_subtask_end () abstractmethod Check if the subtask ends. Returns: bool \u2013 True if the subtask ends, False otherwise. Source code in agents/states/basic.py 157 158 159 160 161 162 163 @abstractmethod def is_subtask_end ( self ) -> bool : \"\"\" Check if the subtask ends. :return: True if the subtask ends, False otherwise. \"\"\" pass name () abstractmethod classmethod The class name of the state. Returns: str \u2013 The class name of the state. Source code in agents/states/basic.py 174 175 176 177 178 179 180 181 @classmethod @abstractmethod def name ( cls ) -> str : \"\"\" The class name of the state. :return: The class name of the state. \"\"\" return \"\" next_agent ( agent ) abstractmethod Get the agent for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: BasicAgent \u2013 The agent for the next step. Source code in agents/states/basic.py 131 132 133 134 135 136 137 138 @abstractmethod def next_agent ( self , agent : BasicAgent ) -> BasicAgent : \"\"\" Get the agent for the next step. :param agent: The agent for the current step. :return: The agent for the next step. \"\"\" return agent next_state ( agent ) abstractmethod Get the state for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: AgentState \u2013 The state for the next step. Source code in agents/states/basic.py 140 141 142 143 144 145 146 147 @abstractmethod def next_state ( self , agent : BasicAgent ) -> AgentState : \"\"\" Get the state for the next step. :param agent: The agent for the current step. :return: The state for the next step. \"\"\" pass Tip The state machine diagrams for the HostAgent and AppAgent are shown in their respective documents. Tip A Round calls the handle , next_state , and next_agent methods of the current state to process the user request and determine the next state and agent to handle the request, and orchestrates the agents to execute the necessary actions.","title":"State"},{"location":"agents/design/state/#agent-state","text":"The State class is a fundamental component of the UFO agent framework. It represents the current state of the agent and determines the next action and agent to handle the request. Each agent has a specific set of states that define the agent's behavior and workflow.","title":"Agent State"},{"location":"agents/design/state/#agentstatus","text":"The set of states for an agent is defined in the AgentStatus class: class AgentStatus(Enum): \"\"\" The status class for the agent. \"\"\" ERROR = \"ERROR\" FINISH = \"FINISH\" CONTINUE = \"CONTINUE\" FAIL = \"FAIL\" PENDING = \"PENDING\" CONFIRM = \"CONFIRM\" SCREENSHOT = \"SCREENSHOT\" Each agent implements its own set of AgentStatus to define the states of the agent.","title":"AgentStatus"},{"location":"agents/design/state/#agentstatemanager","text":"The class AgentStateManager manages the state mapping from a string to the corresponding state class. Each state class is registered with the AgentStateManager using the register decorator to associate the state class with a specific agent, e.g., @AgentStateManager.register class SomeAgentState(AgentState): \"\"\" The state class for the some agent. \"\"\" Tip You can find examples on how to register the state class for the AppAgent in the ufo/agents/states/app_agent_state.py file. Below is the basic structure of the AgentStateManager class: class AgentStateManager(ABC, metaclass=SingletonABCMeta): \"\"\" A abstract class to manage the states of the agent. \"\"\" _state_mapping: Dict[str, Type[AgentState]] = {} def __init__(self): \"\"\" Initialize the state manager. \"\"\" self._state_instance_mapping: Dict[str, AgentState] = {} def get_state(self, status: str) -> AgentState: \"\"\" Get the state for the status. :param status: The status string. :return: The state object. \"\"\" # Lazy load the state class if status not in self._state_instance_mapping: state_class = self._state_mapping.get(status) if state_class: self._state_instance_mapping[status] = state_class() else: self._state_instance_mapping[status] = self.none_state state = self._state_instance_mapping.get(status, self.none_state) return state def add_state(self, status: str, state: AgentState) -> None: \"\"\" Add a new state to the state mapping. :param status: The status string. :param state: The state object. \"\"\" self.state_map[status] = state @property def state_map(self) -> Dict[str, AgentState]: \"\"\" The state mapping of status to state. :return: The state mapping. \"\"\" return self._state_instance_mapping @classmethod def register(cls, state_class: Type[AgentState]) -> Type[AgentState]: \"\"\" Decorator to register the state class to the state manager. :param state_class: The state class to be registered. :return: The state class. \"\"\" cls._state_mapping[state_class.name()] = state_class return state_class @property @abstractmethod def none_state(self) -> AgentState: \"\"\" The none state of the state manager. \"\"\" pass","title":"AgentStateManager"},{"location":"agents/design/state/#agentstate","text":"Each state class inherits from the AgentState class and must implement the method of handle to process the action in the state. In addition, the next_state and next_agent methods are used to determine the next state and agent to handle the transition. Please find below the reference for the State class in UFO. Bases: ABC The abstract class for the agent state.","title":"AgentState"},{"location":"agents/design/state/#agents.states.basic.AgentState.agent_class","text":"The class of the agent. Returns: Type [ BasicAgent ] \u2013 The class of the agent. Source code in agents/states/basic.py 165 166 167 168 169 170 171 172 @classmethod @abstractmethod def agent_class ( cls ) -> Type [ BasicAgent ]: \"\"\" The class of the agent. :return: The class of the agent. \"\"\" pass","title":"agent_class"},{"location":"agents/design/state/#agents.states.basic.AgentState.handle","text":"Handle the agent for the current step. Parameters: agent ( BasicAgent ) \u2013 The agent to handle. context ( Optional ['Context'] , default: None ) \u2013 The context for the agent and session. Source code in agents/states/basic.py 122 123 124 125 126 127 128 129 @abstractmethod def handle ( self , agent : BasicAgent , context : Optional [ \"Context\" ] = None ) -> None : \"\"\" Handle the agent for the current step. :param agent: The agent to handle. :param context: The context for the agent and session. \"\"\" pass","title":"handle"},{"location":"agents/design/state/#agents.states.basic.AgentState.is_round_end","text":"Check if the round ends. Returns: bool \u2013 True if the round ends, False otherwise. Source code in agents/states/basic.py 149 150 151 152 153 154 155 @abstractmethod def is_round_end ( self ) -> bool : \"\"\" Check if the round ends. :return: True if the round ends, False otherwise. \"\"\" pass","title":"is_round_end"},{"location":"agents/design/state/#agents.states.basic.AgentState.is_subtask_end","text":"Check if the subtask ends. Returns: bool \u2013 True if the subtask ends, False otherwise. Source code in agents/states/basic.py 157 158 159 160 161 162 163 @abstractmethod def is_subtask_end ( self ) -> bool : \"\"\" Check if the subtask ends. :return: True if the subtask ends, False otherwise. \"\"\" pass","title":"is_subtask_end"},{"location":"agents/design/state/#agents.states.basic.AgentState.name","text":"The class name of the state. Returns: str \u2013 The class name of the state. Source code in agents/states/basic.py 174 175 176 177 178 179 180 181 @classmethod @abstractmethod def name ( cls ) -> str : \"\"\" The class name of the state. :return: The class name of the state. \"\"\" return \"\"","title":"name"},{"location":"agents/design/state/#agents.states.basic.AgentState.next_agent","text":"Get the agent for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: BasicAgent \u2013 The agent for the next step. Source code in agents/states/basic.py 131 132 133 134 135 136 137 138 @abstractmethod def next_agent ( self , agent : BasicAgent ) -> BasicAgent : \"\"\" Get the agent for the next step. :param agent: The agent for the current step. :return: The agent for the next step. \"\"\" return agent","title":"next_agent"},{"location":"agents/design/state/#agents.states.basic.AgentState.next_state","text":"Get the state for the next step. Parameters: agent ( BasicAgent ) \u2013 The agent for the current step. Returns: AgentState \u2013 The state for the next step. Source code in agents/states/basic.py 140 141 142 143 144 145 146 147 @abstractmethod def next_state ( self , agent : BasicAgent ) -> AgentState : \"\"\" Get the state for the next step. :param agent: The agent for the current step. :return: The state for the next step. \"\"\" pass Tip The state machine diagrams for the HostAgent and AppAgent are shown in their respective documents. Tip A Round calls the handle , next_state , and next_agent methods of the current state to process the user request and determine the next state and agent to handle the request, and orchestrates the agents to execute the necessary actions.","title":"next_state"},{"location":"automator/ai_tool_automator/","text":"AI Tool Automator The AI Tool Automator is a component of the UFO framework that enables the agent to interact with AI tools based on large language models (LLMs). The AI Tool Automator is designed to facilitate the integration of LLM-based AI tools into the UFO framework, enabling the agent to leverage the capabilities of these tools to perform complex tasks. Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. These in-app AI tools differ from the AI Tool Automator, which is designed to interact with external AI tools based on LLMs that are not integrated into the application. Configuration The AI Tool Automator shares the same prompt configuration options as the UI Automator: Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" Receiver The AI Tool Automator shares the same receiver structure as the UI Automator. Please refer to the UI Automator Receiver section for more details. Command The command of the AI Tool Automator shares the same structure as the UI Automator. Please refer to the UI Automator Command section for more details. The list of available commands in the AI Tool Automator is shown below: Command Name Function Name Description AnnotationCommand annotation Annotate the control items on the screenshot. SummaryCommand summary Summarize the observation of the current application window.","title":"AI Tool"},{"location":"automator/ai_tool_automator/#ai-tool-automator","text":"The AI Tool Automator is a component of the UFO framework that enables the agent to interact with AI tools based on large language models (LLMs). The AI Tool Automator is designed to facilitate the integration of LLM-based AI tools into the UFO framework, enabling the agent to leverage the capabilities of these tools to perform complex tasks. Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. These in-app AI tools differ from the AI Tool Automator, which is designed to interact with external AI tools based on LLMs that are not integrated into the application.","title":"AI Tool Automator"},{"location":"automator/ai_tool_automator/#configuration","text":"The AI Tool Automator shares the same prompt configuration options as the UI Automator: Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\"","title":"Configuration"},{"location":"automator/ai_tool_automator/#receiver","text":"The AI Tool Automator shares the same receiver structure as the UI Automator. Please refer to the UI Automator Receiver section for more details.","title":"Receiver"},{"location":"automator/ai_tool_automator/#command","text":"The command of the AI Tool Automator shares the same structure as the UI Automator. Please refer to the UI Automator Command section for more details. The list of available commands in the AI Tool Automator is shown below: Command Name Function Name Description AnnotationCommand annotation Annotate the control items on the screenshot. SummaryCommand summary Summarize the observation of the current application window.","title":"Command"},{"location":"automator/bash_automator/","text":"Bash Automator UFO allows the HostAgent to execute bash commands on the host machine. The bash commands can be used to open applications or execute system commands. The Bash Automator is implemented in the ufo/automator/app_apis/shell module. Note Only HostAgent is currently supported by the Bash Automator. Receiver The Web Automator receiver is the ShellReceiver class defined in the ufo/automator/app_apis/shell/shell_client.py file. Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the shell client. Source code in automator/app_apis/shell/shell_client.py 19 20 21 22 def __init__ ( self ) -> None : \"\"\" Initialize the shell client. \"\"\" run_shell ( params ) Run the command. Parameters: params ( Dict [ str , Any ] ) \u2013 The parameters of the command. Returns: Any \u2013 The result content. Source code in automator/app_apis/shell/shell_client.py 24 25 26 27 28 29 30 31 32 33 34 def run_shell ( self , params : Dict [ str , Any ]) -> Any : \"\"\" Run the command. :param params: The parameters of the command. :return: The result content. \"\"\" bash_command = params . get ( \"command\" ) result = subprocess . run ( bash_command , shell = True , capture_output = True , text = True ) return result . stdout Command We now only support one command in the Bash Automator to execute a bash command on the host machine. @ShellReceiver.register class RunShellCommand(ShellCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.run_shell(params=self.params) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"run_shell\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description RunShellCommand run_shell Get the content of a web page into a markdown format.","title":"Bash Automator"},{"location":"automator/bash_automator/#bash-automator","text":"UFO allows the HostAgent to execute bash commands on the host machine. The bash commands can be used to open applications or execute system commands. The Bash Automator is implemented in the ufo/automator/app_apis/shell module. Note Only HostAgent is currently supported by the Bash Automator.","title":"Bash Automator"},{"location":"automator/bash_automator/#receiver","text":"The Web Automator receiver is the ShellReceiver class defined in the ufo/automator/app_apis/shell/shell_client.py file. Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the shell client. Source code in automator/app_apis/shell/shell_client.py 19 20 21 22 def __init__ ( self ) -> None : \"\"\" Initialize the shell client. \"\"\"","title":"Receiver"},{"location":"automator/bash_automator/#automator.app_apis.shell.shell_client.ShellReceiver.run_shell","text":"Run the command. Parameters: params ( Dict [ str , Any ] ) \u2013 The parameters of the command. Returns: Any \u2013 The result content. Source code in automator/app_apis/shell/shell_client.py 24 25 26 27 28 29 30 31 32 33 34 def run_shell ( self , params : Dict [ str , Any ]) -> Any : \"\"\" Run the command. :param params: The parameters of the command. :return: The result content. \"\"\" bash_command = params . get ( \"command\" ) result = subprocess . run ( bash_command , shell = True , capture_output = True , text = True ) return result . stdout","title":"run_shell"},{"location":"automator/bash_automator/#command","text":"We now only support one command in the Bash Automator to execute a bash command on the host machine. @ShellReceiver.register class RunShellCommand(ShellCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.run_shell(params=self.params) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"run_shell\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description RunShellCommand run_shell Get the content of a web page into a markdown format.","title":"Command"},{"location":"automator/overview/","text":"Application Automator The Automator application is a tool that allows UFO to automate and take actions on applications. Currently, UFO supports two types of actions: UI Automation and API . Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. UI Automator - This action type is used to interact with the application's UI controls, such as buttons, text boxes, and menus. UFO uses the UIA or Win32 APIs to interact with the application's UI controls. API - This action type is used to interact with the application's native API. Users and app developers can create their own API actions to interact with specific applications. Web - This action type is used to interact with web applications. UFO uses the crawl4ai library to extract information from web pages. Bash - This action type is used to interact with the command line interface (CLI) of an application. AI Tool - This action type is used to interact with the LLM-based AI tools. Action Design Patterns Actions in UFO are implemented using the command design pattern, which encapsulates a receiver, a command, and an invoker. The receiver is the object that performs the action, the command is the object that encapsulates the action, and the invoker is the object that triggers the action. The basic classes for implementing actions in UFO are as follows: Role Class Description Receiver ufo.automator.basic.ReceiverBasic The base class for all receivers in UFO. Receivers are objects that perform actions on applications. Command ufo.automator.basic.CommandBasic The base class for all commands in UFO. Commands are objects that encapsulate actions to be performed by receivers. Invoker ufo.automator.puppeteer.AppPuppeteer The base class for the invoker in UFO. Invokers are objects that trigger commands to be executed by receivers. The advantage of using the command design pattern in the agent framework is that it allows for the decoupling of the sender and receiver of the action. This decoupling enables the agent to execute actions on different objects without knowing the details of the object or the action being performed, making the agent more flexible and extensible for new actions. Receiver The Receiver is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the ReceiverManager class. You can find the reference for a basic Receiver class below: Bases: ABC The abstract receiver interface. command_registry : Dict [ str , Type [ CommandBasic ]] property Get the command registry. supported_command_names : List [ str ] property Get the command name list. register ( command_class ) classmethod Decorator to register the state class to the state manager. Parameters: command_class ( Type [ CommandBasic ] ) \u2013 The state class to be registered. Returns: Type [ CommandBasic ] \u2013 The state class. Source code in automator/basic.py 46 47 48 49 50 51 52 53 54 @classmethod def register ( cls , command_class : Type [ CommandBasic ]) -> Type [ CommandBasic ]: \"\"\" Decorator to register the state class to the state manager. :param command_class: The state class to be registered. :return: The state class. \"\"\" cls . _command_registry [ command_class . name ()] = command_class return command_class register_command ( command_name , command ) Add to the command registry. Parameters: command_name ( str ) \u2013 The command name. command ( CommandBasic ) \u2013 The command. Source code in automator/basic.py 24 25 26 27 28 29 30 31 def register_command ( self , command_name : str , command : CommandBasic ) -> None : \"\"\" Add to the command registry. :param command_name: The command name. :param command: The command. \"\"\" self . command_registry [ command_name ] = command self_command_mapping () Get the command-receiver mapping. Source code in automator/basic.py 40 41 42 43 44 def self_command_mapping ( self ) -> Dict [ str , CommandBasic ]: \"\"\" Get the command-receiver mapping. \"\"\" return { command_name : self for command_name in self . supported_command_names } Command The Command is a specific action that the Receiver can perform on the application. It encapsulates the function and parameters required to execute the action. The Command class is a base class for all commands in the Automator application. You can find the reference for a basic Command class below: Bases: ABC The abstract command interface. Initialize the command. Parameters: receiver ( ReceiverBasic ) \u2013 The receiver of the command. Source code in automator/basic.py 67 68 69 70 71 72 73 def __init__ ( self , receiver : ReceiverBasic , params : Dict = None ) -> None : \"\"\" Initialize the command. :param receiver: The receiver of the command. \"\"\" self . receiver = receiver self . params = params if params is not None else {} execute () abstractmethod Execute the command. Source code in automator/basic.py 75 76 77 78 79 80 @abstractmethod def execute ( self ): \"\"\" Execute the command. \"\"\" pass redo () Redo the command. Source code in automator/basic.py 88 89 90 91 92 def redo ( self ): \"\"\" Redo the command. \"\"\" self . execute () undo () Undo the command. Source code in automator/basic.py 82 83 84 85 86 def undo ( self ): \"\"\" Undo the command. \"\"\" pass Note Each command must register with a specific Receiver to be executed using the register_command decorator. For example: @ReceiverExample.register class CommandExample(CommandBasic): ... Invoker (AppPuppeteer) The AppPuppeteer plays the role of the invoker in the Automator application. It triggers the commands to be executed by the receivers. The AppPuppeteer equips the AppAgent with the capability to interact with the application's UI controls. It provides functionalities to translate action strings into specific actions and execute them. All available actions are registered in the Puppeteer with the ReceiverManager class. You can find the implementation of the AppPuppeteer class in the ufo/automator/puppeteer.py file, and its reference is shown below. The class for the app puppeteer to automate the app in the Windows environment. Initialize the app puppeteer. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The app root name, e.g., WINWORD.EXE. Source code in automator/puppeteer.py 22 23 24 25 26 27 28 29 30 31 32 def __init__ ( self , process_name : str , app_root_name : str ) -> None : \"\"\" Initialize the app puppeteer. :param process_name: The process name of the app. :param app_root_name: The app root name, e.g., WINWORD.EXE. \"\"\" self . _process_name = process_name self . _app_root_name = app_root_name self . command_queue : Deque [ CommandBasic ] = deque () self . receiver_manager = ReceiverManager () full_path : str property Get the full path of the process. Only works for COM receiver. Returns: str \u2013 The full path of the process. add_command ( command_name , params , * args , ** kwargs ) Add the command to the command queue. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Source code in automator/puppeteer.py 94 95 96 97 98 99 100 101 102 103 def add_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> None : \"\"\" Add the command to the command queue. :param command_name: The command name. :param params: The arguments. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) self . command_queue . append ( command ) close () Close the app. Only works for COM receiver. Source code in automator/puppeteer.py 145 146 147 148 149 150 151 def close ( self ) -> None : \"\"\" Close the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . close () create_command ( command_name , params , * args , ** kwargs ) Create the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments for the command. Source code in automator/puppeteer.py 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def create_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> Optional [ CommandBasic ]: \"\"\" Create the command. :param command_name: The command name. :param params: The arguments for the command. \"\"\" receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) command = receiver . command_registry . get ( command_name . lower (), None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) if command is None : raise ValueError ( f \"Command { command_name } is not supported.\" ) return command ( receiver , params , * args , ** kwargs ) execute_all_commands () Execute all the commands in the command queue. Returns: List [ Any ] \u2013 The execution results. Source code in automator/puppeteer.py 82 83 84 85 86 87 88 89 90 91 92 def execute_all_commands ( self ) -> List [ Any ]: \"\"\" Execute all the commands in the command queue. :return: The execution results. \"\"\" results = [] while self . command_queue : command = self . command_queue . popleft () results . append ( command . execute ()) return results execute_command ( command_name , params , * args , ** kwargs ) Execute the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Returns: str \u2013 The execution result. Source code in automator/puppeteer.py 68 69 70 71 72 73 74 75 76 77 78 79 80 def execute_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> str : \"\"\" Execute the command. :param command_name: The command name. :param params: The arguments. :return: The execution result. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) return command . execute () get_command_queue_length () Get the length of the command queue. Returns: int \u2013 The length of the command queue. Source code in automator/puppeteer.py 105 106 107 108 109 110 def get_command_queue_length ( self ) -> int : \"\"\" Get the length of the command queue. :return: The length of the command queue. \"\"\" return len ( self . command_queue ) get_command_string ( command_name , params ) staticmethod Generate a function call string. Parameters: command_name ( str ) \u2013 The function name. params ( Dict [ str , str ] ) \u2013 The arguments as a dictionary. Returns: str \u2013 The function call string. Source code in automator/puppeteer.py 153 154 155 156 157 158 159 160 161 162 163 164 165 @staticmethod def get_command_string ( command_name : str , params : Dict [ str , str ]) -> str : \"\"\" Generate a function call string. :param command_name: The function name. :param params: The arguments as a dictionary. :return: The function call string. \"\"\" # Format the arguments args_str = \", \" . join ( f \" { k } = { v !r} \" for k , v in params . items ()) # Return the function call string return f \" { command_name } ( { args_str } )\" get_command_types ( command_name ) Get the command types. Parameters: command_name ( str ) \u2013 The command name. Returns: str \u2013 The command types. Source code in automator/puppeteer.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 def get_command_types ( self , command_name : str ) -> str : \"\"\" Get the command types. :param command_name: The command name. :return: The command types. \"\"\" try : receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) return receiver . type_name except : return \"\" save () Save the current state of the app. Only works for COM receiver. Source code in automator/puppeteer.py 124 125 126 127 128 129 130 def save ( self ) -> None : \"\"\" Save the current state of the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . save () save_to_xml ( file_path ) Save the current state of the app to XML. Only works for COM receiver. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/puppeteer.py 132 133 134 135 136 137 138 139 140 141 142 143 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. Only works for COM receiver. :param file_path: The file path to save the XML. \"\"\" com_receiver = self . receiver_manager . com_receiver dir_path = os . path . dirname ( file_path ) if not os . path . exists ( dir_path ): os . makedirs ( dir_path ) if com_receiver is not None : com_receiver . save_to_xml ( file_path ) Receiver Manager The ReceiverManager manages all the receivers and commands in the Automator application. It provides functionalities to register and retrieve receivers and commands. It is a complementary component to the AppPuppeteer . The class for the receiver manager. Initialize the receiver manager. Source code in automator/puppeteer.py 175 176 177 178 179 180 181 182 183 def __init__ ( self ): \"\"\" Initialize the receiver manager. \"\"\" self . receiver_registry = {} self . ui_control_receiver : Optional [ ControlReceiver ] = None self . _receiver_list : List [ ReceiverBasic ] = [] com_receiver : WinCOMReceiverBasic property Get the COM receiver. Returns: WinCOMReceiverBasic \u2013 The COM receiver. receiver_factory_registry : Dict [ str , Dict [ str , Union [ str , ReceiverFactory ]]] property Get the receiver factory registry. Returns: Dict [ str , Dict [ str , Union [ str , ReceiverFactory ]]] \u2013 The receiver factory registry. receiver_list : List [ ReceiverBasic ] property Get the receiver list. Returns: List [ ReceiverBasic ] \u2013 The receiver list. create_api_receiver ( app_root_name , process_name ) Get the API receiver. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. Source code in automator/puppeteer.py 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 def create_api_receiver ( self , app_root_name : str , process_name : str ) -> None : \"\"\" Get the API receiver. :param app_root_name: The app root name. :param process_name: The process name. \"\"\" for receiver_factory_dict in self . receiver_factory_registry . values (): # Check if the receiver is API if receiver_factory_dict . get ( \"is_api\" ): receiver = receiver_factory_dict . get ( \"factory\" ) . create_receiver ( app_root_name , process_name ) if receiver is not None : self . receiver_list . append ( receiver ) self . _update_receiver_registry () create_ui_control_receiver ( control , application ) Build the UI controller. Parameters: control ( UIAWrapper ) \u2013 The control element. application ( UIAWrapper ) \u2013 The application window. Returns: ControlReceiver \u2013 The UI controller receiver. Source code in automator/puppeteer.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 def create_ui_control_receiver ( self , control : UIAWrapper , application : UIAWrapper ) -> \"ControlReceiver\" : \"\"\" Build the UI controller. :param control: The control element. :param application: The application window. :return: The UI controller receiver. \"\"\" # control can be None if not application : return None factory : ReceiverFactory = self . receiver_factory_registry . get ( \"UIControl\" ) . get ( \"factory\" ) self . ui_control_receiver = factory . create_receiver ( control , application ) self . receiver_list . append ( self . ui_control_receiver ) self . _update_receiver_registry () return self . ui_control_receiver get_receiver_from_command_name ( command_name ) Get the receiver from the command name. Parameters: command_name ( str ) \u2013 The command name. Returns: ReceiverBasic \u2013 The mapped receiver. Source code in automator/puppeteer.py 235 236 237 238 239 240 241 242 243 244 def get_receiver_from_command_name ( self , command_name : str ) -> ReceiverBasic : \"\"\" Get the receiver from the command name. :param command_name: The command name. :return: The mapped receiver. \"\"\" receiver = self . receiver_registry . get ( command_name , None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) return receiver register ( receiver_factory_class ) classmethod Decorator to register the receiver factory class to the receiver manager. Parameters: receiver_factory_class ( Type [ ReceiverFactory ] ) \u2013 The receiver factory class to be registered. Returns: ReceiverFactory \u2013 The receiver factory class instance. Source code in automator/puppeteer.py 276 277 278 279 280 281 282 283 284 285 286 287 288 289 @classmethod def register ( cls , receiver_factory_class : Type [ ReceiverFactory ]) -> ReceiverFactory : \"\"\" Decorator to register the receiver factory class to the receiver manager. :param receiver_factory_class: The receiver factory class to be registered. :return: The receiver factory class instance. \"\"\" cls . _receiver_factory_registry [ receiver_factory_class . name ()] = { \"factory\" : receiver_factory_class (), \"is_api\" : receiver_factory_class . is_api (), } return receiver_factory_class () For further details, refer to the specific documentation for each component and class in the Automator module.","title":"Overview"},{"location":"automator/overview/#application-automator","text":"The Automator application is a tool that allows UFO to automate and take actions on applications. Currently, UFO supports two types of actions: UI Automation and API . Note UFO can also call in-app AI tools, such as Copilot , to assist with the automation process. This is achieved by using either UI Automation or API to interact with the in-app AI tool. UI Automator - This action type is used to interact with the application's UI controls, such as buttons, text boxes, and menus. UFO uses the UIA or Win32 APIs to interact with the application's UI controls. API - This action type is used to interact with the application's native API. Users and app developers can create their own API actions to interact with specific applications. Web - This action type is used to interact with web applications. UFO uses the crawl4ai library to extract information from web pages. Bash - This action type is used to interact with the command line interface (CLI) of an application. AI Tool - This action type is used to interact with the LLM-based AI tools.","title":"Application Automator"},{"location":"automator/overview/#action-design-patterns","text":"Actions in UFO are implemented using the command design pattern, which encapsulates a receiver, a command, and an invoker. The receiver is the object that performs the action, the command is the object that encapsulates the action, and the invoker is the object that triggers the action. The basic classes for implementing actions in UFO are as follows: Role Class Description Receiver ufo.automator.basic.ReceiverBasic The base class for all receivers in UFO. Receivers are objects that perform actions on applications. Command ufo.automator.basic.CommandBasic The base class for all commands in UFO. Commands are objects that encapsulate actions to be performed by receivers. Invoker ufo.automator.puppeteer.AppPuppeteer The base class for the invoker in UFO. Invokers are objects that trigger commands to be executed by receivers. The advantage of using the command design pattern in the agent framework is that it allows for the decoupling of the sender and receiver of the action. This decoupling enables the agent to execute actions on different objects without knowing the details of the object or the action being performed, making the agent more flexible and extensible for new actions.","title":"Action Design Patterns"},{"location":"automator/overview/#receiver","text":"The Receiver is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the ReceiverManager class. You can find the reference for a basic Receiver class below: Bases: ABC The abstract receiver interface.","title":"Receiver"},{"location":"automator/overview/#automator.basic.ReceiverBasic.command_registry","text":"Get the command registry.","title":"command_registry"},{"location":"automator/overview/#automator.basic.ReceiverBasic.supported_command_names","text":"Get the command name list.","title":"supported_command_names"},{"location":"automator/overview/#automator.basic.ReceiverBasic.register","text":"Decorator to register the state class to the state manager. Parameters: command_class ( Type [ CommandBasic ] ) \u2013 The state class to be registered. Returns: Type [ CommandBasic ] \u2013 The state class. Source code in automator/basic.py 46 47 48 49 50 51 52 53 54 @classmethod def register ( cls , command_class : Type [ CommandBasic ]) -> Type [ CommandBasic ]: \"\"\" Decorator to register the state class to the state manager. :param command_class: The state class to be registered. :return: The state class. \"\"\" cls . _command_registry [ command_class . name ()] = command_class return command_class","title":"register"},{"location":"automator/overview/#automator.basic.ReceiverBasic.register_command","text":"Add to the command registry. Parameters: command_name ( str ) \u2013 The command name. command ( CommandBasic ) \u2013 The command. Source code in automator/basic.py 24 25 26 27 28 29 30 31 def register_command ( self , command_name : str , command : CommandBasic ) -> None : \"\"\" Add to the command registry. :param command_name: The command name. :param command: The command. \"\"\" self . command_registry [ command_name ] = command","title":"register_command"},{"location":"automator/overview/#automator.basic.ReceiverBasic.self_command_mapping","text":"Get the command-receiver mapping. Source code in automator/basic.py 40 41 42 43 44 def self_command_mapping ( self ) -> Dict [ str , CommandBasic ]: \"\"\" Get the command-receiver mapping. \"\"\" return { command_name : self for command_name in self . supported_command_names }","title":"self_command_mapping"},{"location":"automator/overview/#command","text":"The Command is a specific action that the Receiver can perform on the application. It encapsulates the function and parameters required to execute the action. The Command class is a base class for all commands in the Automator application. You can find the reference for a basic Command class below: Bases: ABC The abstract command interface. Initialize the command. Parameters: receiver ( ReceiverBasic ) \u2013 The receiver of the command. Source code in automator/basic.py 67 68 69 70 71 72 73 def __init__ ( self , receiver : ReceiverBasic , params : Dict = None ) -> None : \"\"\" Initialize the command. :param receiver: The receiver of the command. \"\"\" self . receiver = receiver self . params = params if params is not None else {}","title":"Command"},{"location":"automator/overview/#automator.basic.CommandBasic.execute","text":"Execute the command. Source code in automator/basic.py 75 76 77 78 79 80 @abstractmethod def execute ( self ): \"\"\" Execute the command. \"\"\" pass","title":"execute"},{"location":"automator/overview/#automator.basic.CommandBasic.redo","text":"Redo the command. Source code in automator/basic.py 88 89 90 91 92 def redo ( self ): \"\"\" Redo the command. \"\"\" self . execute ()","title":"redo"},{"location":"automator/overview/#automator.basic.CommandBasic.undo","text":"Undo the command. Source code in automator/basic.py 82 83 84 85 86 def undo ( self ): \"\"\" Undo the command. \"\"\" pass Note Each command must register with a specific Receiver to be executed using the register_command decorator. For example: @ReceiverExample.register class CommandExample(CommandBasic): ...","title":"undo"},{"location":"automator/overview/#invoker-apppuppeteer","text":"The AppPuppeteer plays the role of the invoker in the Automator application. It triggers the commands to be executed by the receivers. The AppPuppeteer equips the AppAgent with the capability to interact with the application's UI controls. It provides functionalities to translate action strings into specific actions and execute them. All available actions are registered in the Puppeteer with the ReceiverManager class. You can find the implementation of the AppPuppeteer class in the ufo/automator/puppeteer.py file, and its reference is shown below. The class for the app puppeteer to automate the app in the Windows environment. Initialize the app puppeteer. Parameters: process_name ( str ) \u2013 The process name of the app. app_root_name ( str ) \u2013 The app root name, e.g., WINWORD.EXE. Source code in automator/puppeteer.py 22 23 24 25 26 27 28 29 30 31 32 def __init__ ( self , process_name : str , app_root_name : str ) -> None : \"\"\" Initialize the app puppeteer. :param process_name: The process name of the app. :param app_root_name: The app root name, e.g., WINWORD.EXE. \"\"\" self . _process_name = process_name self . _app_root_name = app_root_name self . command_queue : Deque [ CommandBasic ] = deque () self . receiver_manager = ReceiverManager ()","title":"Invoker (AppPuppeteer)"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.full_path","text":"Get the full path of the process. Only works for COM receiver. Returns: str \u2013 The full path of the process.","title":"full_path"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.add_command","text":"Add the command to the command queue. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Source code in automator/puppeteer.py 94 95 96 97 98 99 100 101 102 103 def add_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> None : \"\"\" Add the command to the command queue. :param command_name: The command name. :param params: The arguments. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) self . command_queue . append ( command )","title":"add_command"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.close","text":"Close the app. Only works for COM receiver. Source code in automator/puppeteer.py 145 146 147 148 149 150 151 def close ( self ) -> None : \"\"\" Close the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . close ()","title":"close"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.create_command","text":"Create the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments for the command. Source code in automator/puppeteer.py 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 def create_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> Optional [ CommandBasic ]: \"\"\" Create the command. :param command_name: The command name. :param params: The arguments for the command. \"\"\" receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) command = receiver . command_registry . get ( command_name . lower (), None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) if command is None : raise ValueError ( f \"Command { command_name } is not supported.\" ) return command ( receiver , params , * args , ** kwargs )","title":"create_command"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.execute_all_commands","text":"Execute all the commands in the command queue. Returns: List [ Any ] \u2013 The execution results. Source code in automator/puppeteer.py 82 83 84 85 86 87 88 89 90 91 92 def execute_all_commands ( self ) -> List [ Any ]: \"\"\" Execute all the commands in the command queue. :return: The execution results. \"\"\" results = [] while self . command_queue : command = self . command_queue . popleft () results . append ( command . execute ()) return results","title":"execute_all_commands"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.execute_command","text":"Execute the command. Parameters: command_name ( str ) \u2013 The command name. params ( Dict [ str , Any ] ) \u2013 The arguments. Returns: str \u2013 The execution result. Source code in automator/puppeteer.py 68 69 70 71 72 73 74 75 76 77 78 79 80 def execute_command ( self , command_name : str , params : Dict [ str , Any ], * args , ** kwargs ) -> str : \"\"\" Execute the command. :param command_name: The command name. :param params: The arguments. :return: The execution result. \"\"\" command = self . create_command ( command_name , params , * args , ** kwargs ) return command . execute ()","title":"execute_command"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.get_command_queue_length","text":"Get the length of the command queue. Returns: int \u2013 The length of the command queue. Source code in automator/puppeteer.py 105 106 107 108 109 110 def get_command_queue_length ( self ) -> int : \"\"\" Get the length of the command queue. :return: The length of the command queue. \"\"\" return len ( self . command_queue )","title":"get_command_queue_length"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.get_command_string","text":"Generate a function call string. Parameters: command_name ( str ) \u2013 The function name. params ( Dict [ str , str ] ) \u2013 The arguments as a dictionary. Returns: str \u2013 The function call string. Source code in automator/puppeteer.py 153 154 155 156 157 158 159 160 161 162 163 164 165 @staticmethod def get_command_string ( command_name : str , params : Dict [ str , str ]) -> str : \"\"\" Generate a function call string. :param command_name: The function name. :param params: The arguments as a dictionary. :return: The function call string. \"\"\" # Format the arguments args_str = \", \" . join ( f \" { k } = { v !r} \" for k , v in params . items ()) # Return the function call string return f \" { command_name } ( { args_str } )\"","title":"get_command_string"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.get_command_types","text":"Get the command types. Parameters: command_name ( str ) \u2013 The command name. Returns: str \u2013 The command types. Source code in automator/puppeteer.py 53 54 55 56 57 58 59 60 61 62 63 64 65 66 def get_command_types ( self , command_name : str ) -> str : \"\"\" Get the command types. :param command_name: The command name. :return: The command types. \"\"\" try : receiver = self . receiver_manager . get_receiver_from_command_name ( command_name ) return receiver . type_name except : return \"\"","title":"get_command_types"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.save","text":"Save the current state of the app. Only works for COM receiver. Source code in automator/puppeteer.py 124 125 126 127 128 129 130 def save ( self ) -> None : \"\"\" Save the current state of the app. Only works for COM receiver. \"\"\" com_receiver = self . receiver_manager . com_receiver if com_receiver is not None : com_receiver . save ()","title":"save"},{"location":"automator/overview/#automator.puppeteer.AppPuppeteer.save_to_xml","text":"Save the current state of the app to XML. Only works for COM receiver. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/puppeteer.py 132 133 134 135 136 137 138 139 140 141 142 143 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. Only works for COM receiver. :param file_path: The file path to save the XML. \"\"\" com_receiver = self . receiver_manager . com_receiver dir_path = os . path . dirname ( file_path ) if not os . path . exists ( dir_path ): os . makedirs ( dir_path ) if com_receiver is not None : com_receiver . save_to_xml ( file_path )","title":"save_to_xml"},{"location":"automator/overview/#receiver-manager","text":"The ReceiverManager manages all the receivers and commands in the Automator application. It provides functionalities to register and retrieve receivers and commands. It is a complementary component to the AppPuppeteer . The class for the receiver manager. Initialize the receiver manager. Source code in automator/puppeteer.py 175 176 177 178 179 180 181 182 183 def __init__ ( self ): \"\"\" Initialize the receiver manager. \"\"\" self . receiver_registry = {} self . ui_control_receiver : Optional [ ControlReceiver ] = None self . _receiver_list : List [ ReceiverBasic ] = []","title":"Receiver Manager"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.com_receiver","text":"Get the COM receiver. Returns: WinCOMReceiverBasic \u2013 The COM receiver.","title":"com_receiver"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.receiver_factory_registry","text":"Get the receiver factory registry. Returns: Dict [ str , Dict [ str , Union [ str , ReceiverFactory ]]] \u2013 The receiver factory registry.","title":"receiver_factory_registry"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.receiver_list","text":"Get the receiver list. Returns: List [ ReceiverBasic ] \u2013 The receiver list.","title":"receiver_list"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.create_api_receiver","text":"Get the API receiver. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. Source code in automator/puppeteer.py 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 def create_api_receiver ( self , app_root_name : str , process_name : str ) -> None : \"\"\" Get the API receiver. :param app_root_name: The app root name. :param process_name: The process name. \"\"\" for receiver_factory_dict in self . receiver_factory_registry . values (): # Check if the receiver is API if receiver_factory_dict . get ( \"is_api\" ): receiver = receiver_factory_dict . get ( \"factory\" ) . create_receiver ( app_root_name , process_name ) if receiver is not None : self . receiver_list . append ( receiver ) self . _update_receiver_registry ()","title":"create_api_receiver"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.create_ui_control_receiver","text":"Build the UI controller. Parameters: control ( UIAWrapper ) \u2013 The control element. application ( UIAWrapper ) \u2013 The application window. Returns: ControlReceiver \u2013 The UI controller receiver. Source code in automator/puppeteer.py 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 def create_ui_control_receiver ( self , control : UIAWrapper , application : UIAWrapper ) -> \"ControlReceiver\" : \"\"\" Build the UI controller. :param control: The control element. :param application: The application window. :return: The UI controller receiver. \"\"\" # control can be None if not application : return None factory : ReceiverFactory = self . receiver_factory_registry . get ( \"UIControl\" ) . get ( \"factory\" ) self . ui_control_receiver = factory . create_receiver ( control , application ) self . receiver_list . append ( self . ui_control_receiver ) self . _update_receiver_registry () return self . ui_control_receiver","title":"create_ui_control_receiver"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.get_receiver_from_command_name","text":"Get the receiver from the command name. Parameters: command_name ( str ) \u2013 The command name. Returns: ReceiverBasic \u2013 The mapped receiver. Source code in automator/puppeteer.py 235 236 237 238 239 240 241 242 243 244 def get_receiver_from_command_name ( self , command_name : str ) -> ReceiverBasic : \"\"\" Get the receiver from the command name. :param command_name: The command name. :return: The mapped receiver. \"\"\" receiver = self . receiver_registry . get ( command_name , None ) if receiver is None : raise ValueError ( f \"Receiver for command { command_name } is not found.\" ) return receiver","title":"get_receiver_from_command_name"},{"location":"automator/overview/#automator.puppeteer.ReceiverManager.register","text":"Decorator to register the receiver factory class to the receiver manager. Parameters: receiver_factory_class ( Type [ ReceiverFactory ] ) \u2013 The receiver factory class to be registered. Returns: ReceiverFactory \u2013 The receiver factory class instance. Source code in automator/puppeteer.py 276 277 278 279 280 281 282 283 284 285 286 287 288 289 @classmethod def register ( cls , receiver_factory_class : Type [ ReceiverFactory ]) -> ReceiverFactory : \"\"\" Decorator to register the receiver factory class to the receiver manager. :param receiver_factory_class: The receiver factory class to be registered. :return: The receiver factory class instance. \"\"\" cls . _receiver_factory_registry [ receiver_factory_class . name ()] = { \"factory\" : receiver_factory_class (), \"is_api\" : receiver_factory_class . is_api (), } return receiver_factory_class () For further details, refer to the specific documentation for each component and class in the Automator module.","title":"register"},{"location":"automator/ui_automator/","text":"UI Automator The UI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus. Configuration There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml file. Below is the list of configurations related to the UI Automator: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False Receiver The receiver of the UI Automator is the ControlReceiver class defined in the ufo/automator/ui_control/controller/control_receiver module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver class: Bases: ReceiverBasic The control receiver class. Initialize the control receiver. Parameters: control ( Optional [ UIAWrapper ] ) \u2013 The control element. application ( Optional [ UIAWrapper ] ) \u2013 The application element. Source code in automator/ui_control/controller.py 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 def __init__ ( self , control : Optional [ UIAWrapper ], application : Optional [ UIAWrapper ] ) -> None : \"\"\" Initialize the control receiver. :param control: The control element. :param application: The application element. \"\"\" self . control = control self . application = application if control : self . control . set_focus () self . wait_enabled () elif application : self . application . set_focus () annotation ( params , annotation_dict ) Take a screenshot of the current application window and annotate the control item on the screenshot. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the annotation method. annotation_dict ( Dict [ str , UIAWrapper ] ) \u2013 The dictionary of the control labels. Source code in automator/ui_control/controller.py 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 def annotation ( self , params : Dict [ str , str ], annotation_dict : Dict [ str , UIAWrapper ] ) -> List [ str ]: \"\"\" Take a screenshot of the current application window and annotate the control item on the screenshot. :param params: The arguments of the annotation method. :param annotation_dict: The dictionary of the control labels. \"\"\" selected_controls_labels = params . get ( \"control_labels\" , []) control_reannotate = [ annotation_dict [ str ( label )] for label in selected_controls_labels ] return control_reannotate atomic_execution ( method_name , params ) Atomic execution of the action on the control elements. Parameters: method_name ( str ) \u2013 The name of the method to execute. params ( Dict [ str , Any ] ) \u2013 The arguments of the method. Returns: str \u2013 The result of the action. Source code in automator/ui_control/controller.py 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def atomic_execution ( self , method_name : str , params : Dict [ str , Any ]) -> str : \"\"\" Atomic execution of the action on the control elements. :param method_name: The name of the method to execute. :param params: The arguments of the method. :return: The result of the action. \"\"\" import traceback try : method = getattr ( self . control , method_name ) result = method ( ** params ) except AttributeError : message = f \" { self . control } doesn't have a method named { method_name } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message except Exception as e : full_traceback = traceback . format_exc () message = f \"An error occurred: { full_traceback } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message return result click_input ( params ) Click the control element. Parameters: params ( Dict [ str , Union [ str , bool ]] ) \u2013 The arguments of the click method. Returns: str \u2013 The result of the click action. Source code in automator/ui_control/controller.py 79 80 81 82 83 84 85 86 87 88 89 90 91 def click_input ( self , params : Dict [ str , Union [ str , bool ]]) -> str : \"\"\" Click the control element. :param params: The arguments of the click method. :return: The result of the click action. \"\"\" api_name = configs . get ( \"CLICK_API\" , \"click_input\" ) if api_name == \"click\" : return self . atomic_execution ( \"click\" , params ) else : return self . atomic_execution ( \"click_input\" , params ) click_on_coordinates ( params ) Click on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the click on coordinates method. Returns: str \u2013 The result of the click on coordinates action. Source code in automator/ui_control/controller.py 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 def click_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Click on the coordinates of the control element. :param params: The arguments of the click on coordinates method. :return: The result of the click on coordinates action. \"\"\" # Get the relative coordinates fraction of the application window. x = float ( params . get ( \"x\" , 0 )) y = float ( params . get ( \"y\" , 0 )) button = params . get ( \"button\" , \"left\" ) double = params . get ( \"double\" , False ) # Get the absolute coordinates of the application window. tranformed_x , tranformed_y = self . transform_point ( x , y ) self . application . set_focus () pyautogui . click ( tranformed_x , tranformed_y , button = button , clicks = 2 if double else 1 ) return \"\" drag_on_coordinates ( params ) Drag on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the drag on coordinates method. Returns: str \u2013 The result of the drag on coordinates action. Source code in automator/ui_control/controller.py 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 def drag_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Drag on the coordinates of the control element. :param params: The arguments of the drag on coordinates method. :return: The result of the drag on coordinates action. \"\"\" start = self . transform_point ( float ( params . get ( \"start_x\" , 0 )), float ( params . get ( \"start_y\" , 0 )) ) end = self . transform_point ( float ( params . get ( \"end_x\" , 0 )), float ( params . get ( \"end_y\" , 0 )) ) button = params . get ( \"button\" , \"left\" ) self . application . set_focus () pyautogui . moveTo ( start [ 0 ], start [ 1 ]) pyautogui . dragTo ( end [ 0 ], end [ 1 ], button = button ) return \"\" keyboard_input ( params ) Keyboard input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the keyboard input method. Returns: str \u2013 The result of the keyboard input action. Source code in automator/ui_control/controller.py 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 def keyboard_input ( self , params : Dict [ str , str ]) -> str : \"\"\" Keyboard input on the control element. :param params: The arguments of the keyboard input method. :return: The result of the keyboard input action. \"\"\" control_focus = params . get ( \"control_focus\" , True ) keys = params . get ( \"keys\" , \"\" ) if control_focus : self . atomic_execution ( \"type_keys\" , { \"keys\" : keys }) else : pyautogui . typewrite ( keys ) return keys no_action () No action on the control element. Returns: \u2013 The result of the no action. Source code in automator/ui_control/controller.py 232 233 234 235 236 237 238 def no_action ( self ): \"\"\" No action on the control element. :return: The result of the no action. \"\"\" return \"\" set_edit_text ( params ) Set the edit text of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the set edit text method. Returns: str \u2013 The result of the set edit text action. Source code in automator/ui_control/controller.py 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 def set_edit_text ( self , params : Dict [ str , str ]) -> str : \"\"\" Set the edit text of the control element. :param params: The arguments of the set edit text method. :return: The result of the set edit text action. \"\"\" text = params . get ( \"text\" , \"\" ) inter_key_pause = configs . get ( \"INPUT_TEXT_INTER_KEY_PAUSE\" , 0.1 ) if configs [ \"INPUT_TEXT_API\" ] == \"set_text\" : method_name = \"set_edit_text\" args = { \"text\" : text } else : method_name = \"type_keys\" # Transform the text according to the tags. text = TextTransformer . transform_text ( text , \"all\" ) args = { \"keys\" : text , \"pause\" : inter_key_pause , \"with_spaces\" : True } try : result = self . atomic_execution ( method_name , args ) if ( method_name == \"set_text\" and args [ \"text\" ] not in self . control . window_text () ): raise Exception ( f \"Failed to use set_text: { args [ 'text' ] } \" ) if configs [ \"INPUT_TEXT_ENTER\" ] and method_name in [ \"type_keys\" , \"set_text\" ]: self . atomic_execution ( \"type_keys\" , params = { \"keys\" : \" {ENTER} \" }) return result except Exception as e : if method_name == \"set_text\" : print_with_color ( f \" { self . control } doesn't have a method named { method_name } , trying default input method\" , \"yellow\" , ) method_name = \"type_keys\" clear_text_keys = \"^a {BACKSPACE} \" text_to_type = args [ \"text\" ] keys_to_send = clear_text_keys + text_to_type method_name = \"type_keys\" args = { \"keys\" : keys_to_send , \"pause\" : inter_key_pause , \"with_spaces\" : True , } return self . atomic_execution ( method_name , args ) else : return f \"An error occurred: { e } \" summary ( params ) Visual summary of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the visual summary method. should contain a key \"text\" with the text summary. Returns: str \u2013 The result of the visual summary action. Source code in automator/ui_control/controller.py 141 142 143 144 145 146 147 148 def summary ( self , params : Dict [ str , str ]) -> str : \"\"\" Visual summary of the control element. :param params: The arguments of the visual summary method. should contain a key \"text\" with the text summary. :return: The result of the visual summary action. \"\"\" return params . get ( \"text\" ) texts () Get the text of the control element. Returns: str \u2013 The text of the control element. Source code in automator/ui_control/controller.py 217 218 219 220 221 222 def texts ( self ) -> str : \"\"\" Get the text of the control element. :return: The text of the control element. \"\"\" return self . control . texts () transform_point ( fraction_x , fraction_y ) Transform the relative coordinates to the absolute coordinates. Parameters: fraction_x ( float ) \u2013 The relative x coordinate. fraction_y ( float ) \u2013 The relative y coordinate. Returns: Tuple [ int , int ] \u2013 The absolute coordinates. Source code in automator/ui_control/controller.py 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 def transform_point ( self , fraction_x : float , fraction_y : float ) -> Tuple [ int , int ]: \"\"\" Transform the relative coordinates to the absolute coordinates. :param fraction_x: The relative x coordinate. :param fraction_y: The relative y coordinate. :return: The absolute coordinates. \"\"\" application_rect : RECT = self . application . rectangle () application_x = application_rect . left application_y = application_rect . top application_width = application_rect . width () application_height = application_rect . height () x = application_x + int ( application_width * fraction_x ) y = application_y + int ( application_height * fraction_y ) return x , y wait_enabled ( timeout = 10 , retry_interval = 0.5 ) Wait until the control is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 256 257 258 259 260 261 262 263 264 265 266 267 def wait_enabled ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the control is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_enabled (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not enabled.\" ) break wait_visible ( timeout = 10 , retry_interval = 0.5 ) Wait until the window is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 269 270 271 272 273 274 275 276 277 278 279 280 def wait_visible ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the window is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_visible (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not visible.\" ) break wheel_mouse_input ( params ) Wheel mouse input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the wheel mouse input method. Returns: \u2013 The result of the wheel mouse input action. Source code in automator/ui_control/controller.py 224 225 226 227 228 229 230 def wheel_mouse_input ( self , params : Dict [ str , str ]): \"\"\" Wheel mouse input on the control element. :param params: The arguments of the wheel mouse input method. :return: The result of the wheel mouse input action. \"\"\" return self . atomic_execution ( \"wheel_mouse_input\" , params ) Command The command of the UI Automator is the ControlCommand class defined in the ufo/automator/ui_control/controller/ControlCommand module. It encapsulates the function and parameters required to execute the action. The ControlCommand class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand class that inherits from the ControlCommand class: @ControlReceiver.register class ClickInputCommand(ControlCommand): \"\"\" The click input command class. \"\"\" def execute(self) -> str: \"\"\" Execute the click input command. :return: The result of the click input command. \"\"\" return self.receiver.click_input(self.params) @classmethod def name(cls) -> str: \"\"\" Get the name of the atomic command. :return: The name of the atomic command. \"\"\" return \"click_input\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a specific ControlReceiver to be executed using the @ControlReceiver.register decorator. Below is the list of available commands in the UI Automator that are currently supported by UFO: Command Name Function Name Description ClickInputCommand click_input Click the control item with the mouse. ClickOnCoordinatesCommand click_on_coordinates Click on the specific fractional coordinates of the application window. DragOnCoordinatesCommand drag_on_coordinates Drag the mouse on the specific fractional coordinates of the application window. SetEditTextCommand set_edit_text Add new text to the control item. GetTextsCommand texts Get the text of the control item. WheelMouseInputCommand wheel_mouse_input Scroll the control item. KeyboardInputCommand keyboard_input Simulate the keyboard input. Tip Please refer to the ufo/prompts/share/base/api.yaml file for the detailed API documentation of the UI Automator. Tip You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand module.","title":"UI Automator"},{"location":"automator/ui_automator/#ui-automator","text":"The UI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus.","title":"UI Automator"},{"location":"automator/ui_automator/#configuration","text":"There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml file. Below is the list of configurations related to the UI Automator: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False","title":"Configuration"},{"location":"automator/ui_automator/#receiver","text":"The receiver of the UI Automator is the ControlReceiver class defined in the ufo/automator/ui_control/controller/control_receiver module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver class: Bases: ReceiverBasic The control receiver class. Initialize the control receiver. Parameters: control ( Optional [ UIAWrapper ] ) \u2013 The control element. application ( Optional [ UIAWrapper ] ) \u2013 The application element. Source code in automator/ui_control/controller.py 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 def __init__ ( self , control : Optional [ UIAWrapper ], application : Optional [ UIAWrapper ] ) -> None : \"\"\" Initialize the control receiver. :param control: The control element. :param application: The application element. \"\"\" self . control = control self . application = application if control : self . control . set_focus () self . wait_enabled () elif application : self . application . set_focus ()","title":"Receiver"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.annotation","text":"Take a screenshot of the current application window and annotate the control item on the screenshot. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the annotation method. annotation_dict ( Dict [ str , UIAWrapper ] ) \u2013 The dictionary of the control labels. Source code in automator/ui_control/controller.py 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 def annotation ( self , params : Dict [ str , str ], annotation_dict : Dict [ str , UIAWrapper ] ) -> List [ str ]: \"\"\" Take a screenshot of the current application window and annotate the control item on the screenshot. :param params: The arguments of the annotation method. :param annotation_dict: The dictionary of the control labels. \"\"\" selected_controls_labels = params . get ( \"control_labels\" , []) control_reannotate = [ annotation_dict [ str ( label )] for label in selected_controls_labels ] return control_reannotate","title":"annotation"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.atomic_execution","text":"Atomic execution of the action on the control elements. Parameters: method_name ( str ) \u2013 The name of the method to execute. params ( Dict [ str , Any ] ) \u2013 The arguments of the method. Returns: str \u2013 The result of the action. Source code in automator/ui_control/controller.py 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def atomic_execution ( self , method_name : str , params : Dict [ str , Any ]) -> str : \"\"\" Atomic execution of the action on the control elements. :param method_name: The name of the method to execute. :param params: The arguments of the method. :return: The result of the action. \"\"\" import traceback try : method = getattr ( self . control , method_name ) result = method ( ** params ) except AttributeError : message = f \" { self . control } doesn't have a method named { method_name } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message except Exception as e : full_traceback = traceback . format_exc () message = f \"An error occurred: { full_traceback } \" print_with_color ( f \"Warning: { message } \" , \"yellow\" ) result = message return result","title":"atomic_execution"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.click_input","text":"Click the control element. Parameters: params ( Dict [ str , Union [ str , bool ]] ) \u2013 The arguments of the click method. Returns: str \u2013 The result of the click action. Source code in automator/ui_control/controller.py 79 80 81 82 83 84 85 86 87 88 89 90 91 def click_input ( self , params : Dict [ str , Union [ str , bool ]]) -> str : \"\"\" Click the control element. :param params: The arguments of the click method. :return: The result of the click action. \"\"\" api_name = configs . get ( \"CLICK_API\" , \"click_input\" ) if api_name == \"click\" : return self . atomic_execution ( \"click\" , params ) else : return self . atomic_execution ( \"click_input\" , params )","title":"click_input"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.click_on_coordinates","text":"Click on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the click on coordinates method. Returns: str \u2013 The result of the click on coordinates action. Source code in automator/ui_control/controller.py 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 def click_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Click on the coordinates of the control element. :param params: The arguments of the click on coordinates method. :return: The result of the click on coordinates action. \"\"\" # Get the relative coordinates fraction of the application window. x = float ( params . get ( \"x\" , 0 )) y = float ( params . get ( \"y\" , 0 )) button = params . get ( \"button\" , \"left\" ) double = params . get ( \"double\" , False ) # Get the absolute coordinates of the application window. tranformed_x , tranformed_y = self . transform_point ( x , y ) self . application . set_focus () pyautogui . click ( tranformed_x , tranformed_y , button = button , clicks = 2 if double else 1 ) return \"\"","title":"click_on_coordinates"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.drag_on_coordinates","text":"Drag on the coordinates of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the drag on coordinates method. Returns: str \u2013 The result of the drag on coordinates action. Source code in automator/ui_control/controller.py 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 def drag_on_coordinates ( self , params : Dict [ str , str ]) -> str : \"\"\" Drag on the coordinates of the control element. :param params: The arguments of the drag on coordinates method. :return: The result of the drag on coordinates action. \"\"\" start = self . transform_point ( float ( params . get ( \"start_x\" , 0 )), float ( params . get ( \"start_y\" , 0 )) ) end = self . transform_point ( float ( params . get ( \"end_x\" , 0 )), float ( params . get ( \"end_y\" , 0 )) ) button = params . get ( \"button\" , \"left\" ) self . application . set_focus () pyautogui . moveTo ( start [ 0 ], start [ 1 ]) pyautogui . dragTo ( end [ 0 ], end [ 1 ], button = button ) return \"\"","title":"drag_on_coordinates"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.keyboard_input","text":"Keyboard input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the keyboard input method. Returns: str \u2013 The result of the keyboard input action. Source code in automator/ui_control/controller.py 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 def keyboard_input ( self , params : Dict [ str , str ]) -> str : \"\"\" Keyboard input on the control element. :param params: The arguments of the keyboard input method. :return: The result of the keyboard input action. \"\"\" control_focus = params . get ( \"control_focus\" , True ) keys = params . get ( \"keys\" , \"\" ) if control_focus : self . atomic_execution ( \"type_keys\" , { \"keys\" : keys }) else : pyautogui . typewrite ( keys ) return keys","title":"keyboard_input"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.no_action","text":"No action on the control element. Returns: \u2013 The result of the no action. Source code in automator/ui_control/controller.py 232 233 234 235 236 237 238 def no_action ( self ): \"\"\" No action on the control element. :return: The result of the no action. \"\"\" return \"\"","title":"no_action"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.set_edit_text","text":"Set the edit text of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the set edit text method. Returns: str \u2013 The result of the set edit text action. Source code in automator/ui_control/controller.py 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 def set_edit_text ( self , params : Dict [ str , str ]) -> str : \"\"\" Set the edit text of the control element. :param params: The arguments of the set edit text method. :return: The result of the set edit text action. \"\"\" text = params . get ( \"text\" , \"\" ) inter_key_pause = configs . get ( \"INPUT_TEXT_INTER_KEY_PAUSE\" , 0.1 ) if configs [ \"INPUT_TEXT_API\" ] == \"set_text\" : method_name = \"set_edit_text\" args = { \"text\" : text } else : method_name = \"type_keys\" # Transform the text according to the tags. text = TextTransformer . transform_text ( text , \"all\" ) args = { \"keys\" : text , \"pause\" : inter_key_pause , \"with_spaces\" : True } try : result = self . atomic_execution ( method_name , args ) if ( method_name == \"set_text\" and args [ \"text\" ] not in self . control . window_text () ): raise Exception ( f \"Failed to use set_text: { args [ 'text' ] } \" ) if configs [ \"INPUT_TEXT_ENTER\" ] and method_name in [ \"type_keys\" , \"set_text\" ]: self . atomic_execution ( \"type_keys\" , params = { \"keys\" : \" {ENTER} \" }) return result except Exception as e : if method_name == \"set_text\" : print_with_color ( f \" { self . control } doesn't have a method named { method_name } , trying default input method\" , \"yellow\" , ) method_name = \"type_keys\" clear_text_keys = \"^a {BACKSPACE} \" text_to_type = args [ \"text\" ] keys_to_send = clear_text_keys + text_to_type method_name = \"type_keys\" args = { \"keys\" : keys_to_send , \"pause\" : inter_key_pause , \"with_spaces\" : True , } return self . atomic_execution ( method_name , args ) else : return f \"An error occurred: { e } \"","title":"set_edit_text"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.summary","text":"Visual summary of the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the visual summary method. should contain a key \"text\" with the text summary. Returns: str \u2013 The result of the visual summary action. Source code in automator/ui_control/controller.py 141 142 143 144 145 146 147 148 def summary ( self , params : Dict [ str , str ]) -> str : \"\"\" Visual summary of the control element. :param params: The arguments of the visual summary method. should contain a key \"text\" with the text summary. :return: The result of the visual summary action. \"\"\" return params . get ( \"text\" )","title":"summary"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.texts","text":"Get the text of the control element. Returns: str \u2013 The text of the control element. Source code in automator/ui_control/controller.py 217 218 219 220 221 222 def texts ( self ) -> str : \"\"\" Get the text of the control element. :return: The text of the control element. \"\"\" return self . control . texts ()","title":"texts"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.transform_point","text":"Transform the relative coordinates to the absolute coordinates. Parameters: fraction_x ( float ) \u2013 The relative x coordinate. fraction_y ( float ) \u2013 The relative y coordinate. Returns: Tuple [ int , int ] \u2013 The absolute coordinates. Source code in automator/ui_control/controller.py 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 def transform_point ( self , fraction_x : float , fraction_y : float ) -> Tuple [ int , int ]: \"\"\" Transform the relative coordinates to the absolute coordinates. :param fraction_x: The relative x coordinate. :param fraction_y: The relative y coordinate. :return: The absolute coordinates. \"\"\" application_rect : RECT = self . application . rectangle () application_x = application_rect . left application_y = application_rect . top application_width = application_rect . width () application_height = application_rect . height () x = application_x + int ( application_width * fraction_x ) y = application_y + int ( application_height * fraction_y ) return x , y","title":"transform_point"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.wait_enabled","text":"Wait until the control is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 256 257 258 259 260 261 262 263 264 265 266 267 def wait_enabled ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the control is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_enabled (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not enabled.\" ) break","title":"wait_enabled"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.wait_visible","text":"Wait until the window is enabled. Parameters: timeout ( int , default: 10 ) \u2013 The timeout to wait. retry_interval ( int , default: 0.5 ) \u2013 The retry interval to wait. Source code in automator/ui_control/controller.py 269 270 271 272 273 274 275 276 277 278 279 280 def wait_visible ( self , timeout : int = 10 , retry_interval : int = 0.5 ) -> None : \"\"\" Wait until the window is enabled. :param timeout: The timeout to wait. :param retry_interval: The retry interval to wait. \"\"\" while not self . control . is_visible (): time . sleep ( retry_interval ) timeout -= retry_interval if timeout <= 0 : warnings . warn ( f \"Timeout: { self . control } is not visible.\" ) break","title":"wait_visible"},{"location":"automator/ui_automator/#automator.ui_control.controller.ControlReceiver.wheel_mouse_input","text":"Wheel mouse input on the control element. Parameters: params ( Dict [ str , str ] ) \u2013 The arguments of the wheel mouse input method. Returns: \u2013 The result of the wheel mouse input action. Source code in automator/ui_control/controller.py 224 225 226 227 228 229 230 def wheel_mouse_input ( self , params : Dict [ str , str ]): \"\"\" Wheel mouse input on the control element. :param params: The arguments of the wheel mouse input method. :return: The result of the wheel mouse input action. \"\"\" return self . atomic_execution ( \"wheel_mouse_input\" , params )","title":"wheel_mouse_input"},{"location":"automator/ui_automator/#command","text":"The command of the UI Automator is the ControlCommand class defined in the ufo/automator/ui_control/controller/ControlCommand module. It encapsulates the function and parameters required to execute the action. The ControlCommand class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand class that inherits from the ControlCommand class: @ControlReceiver.register class ClickInputCommand(ControlCommand): \"\"\" The click input command class. \"\"\" def execute(self) -> str: \"\"\" Execute the click input command. :return: The result of the click input command. \"\"\" return self.receiver.click_input(self.params) @classmethod def name(cls) -> str: \"\"\" Get the name of the atomic command. :return: The name of the atomic command. \"\"\" return \"click_input\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a specific ControlReceiver to be executed using the @ControlReceiver.register decorator. Below is the list of available commands in the UI Automator that are currently supported by UFO: Command Name Function Name Description ClickInputCommand click_input Click the control item with the mouse. ClickOnCoordinatesCommand click_on_coordinates Click on the specific fractional coordinates of the application window. DragOnCoordinatesCommand drag_on_coordinates Drag the mouse on the specific fractional coordinates of the application window. SetEditTextCommand set_edit_text Add new text to the control item. GetTextsCommand texts Get the text of the control item. WheelMouseInputCommand wheel_mouse_input Scroll the control item. KeyboardInputCommand keyboard_input Simulate the keyboard input. Tip Please refer to the ufo/prompts/share/base/api.yaml file for the detailed API documentation of the UI Automator. Tip You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand module.","title":"Command"},{"location":"automator/web_automator/","text":"Web Automator We also support the use of the Web Automator to get the content of a web page. The Web Automator is implemented in ufo/autoamtor/app_apis/web module. Configuration There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only msedge.exe and chrome.exe are currently supported by the Web Automator. Receiver The Web Automator receiver is the WebReceiver class defined in the ufo/automator/app_apis/web/webclient.py module: Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the Web COM client. Source code in automator/app_apis/web/webclient.py 21 22 23 24 25 26 27 def __init__ ( self ) -> None : \"\"\" Initialize the Web COM client. \"\"\" self . _headers = { \"User-Agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3\" } web_crawler ( url , ignore_link ) Run the crawler with various options. Parameters: url ( str ) \u2013 The URL of the webpage. ignore_link ( bool ) \u2013 Whether to ignore the links. Returns: str \u2013 The result markdown content. Source code in automator/app_apis/web/webclient.py 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 def web_crawler ( self , url : str , ignore_link : bool ) -> str : \"\"\" Run the crawler with various options. :param url: The URL of the webpage. :param ignore_link: Whether to ignore the links. :return: The result markdown content. \"\"\" try : # Get the HTML content of the webpage response = requests . get ( url , headers = self . _headers ) response . raise_for_status () html_content = response . text # Convert the HTML content to markdown h = html2text . HTML2Text () h . ignore_links = ignore_link markdown_content = h . handle ( html_content ) return markdown_content except requests . RequestException as e : print ( f \"Error fetching the URL: { e } \" ) return f \"Error fetching the URL: { e } \" Command We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator. @WebReceiver.register class WebCrawlerCommand(WebCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.web_crawler( url=self.params.get(\"url\"), ignore_link=self.params.get(\"ignore_link\", False), ) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"web_crawler\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description WebCrawlerCommand web_crawler Get the content of a web page into a markdown format. Tip Please refer to the ufo/prompts/apps/web/api.yaml file for the prompt details for the WebCrawlerCommand command.","title":"Web Automator"},{"location":"automator/web_automator/#web-automator","text":"We also support the use of the Web Automator to get the content of a web page. The Web Automator is implemented in ufo/autoamtor/app_apis/web module.","title":"Web Automator"},{"location":"automator/web_automator/#configuration","text":"There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only msedge.exe and chrome.exe are currently supported by the Web Automator.","title":"Configuration"},{"location":"automator/web_automator/#receiver","text":"The Web Automator receiver is the WebReceiver class defined in the ufo/automator/app_apis/web/webclient.py module: Bases: ReceiverBasic The base class for Web COM client using crawl4ai. Initialize the Web COM client. Source code in automator/app_apis/web/webclient.py 21 22 23 24 25 26 27 def __init__ ( self ) -> None : \"\"\" Initialize the Web COM client. \"\"\" self . _headers = { \"User-Agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3\" }","title":"Receiver"},{"location":"automator/web_automator/#automator.app_apis.web.webclient.WebReceiver.web_crawler","text":"Run the crawler with various options. Parameters: url ( str ) \u2013 The URL of the webpage. ignore_link ( bool ) \u2013 Whether to ignore the links. Returns: str \u2013 The result markdown content. Source code in automator/app_apis/web/webclient.py 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 def web_crawler ( self , url : str , ignore_link : bool ) -> str : \"\"\" Run the crawler with various options. :param url: The URL of the webpage. :param ignore_link: Whether to ignore the links. :return: The result markdown content. \"\"\" try : # Get the HTML content of the webpage response = requests . get ( url , headers = self . _headers ) response . raise_for_status () html_content = response . text # Convert the HTML content to markdown h = html2text . HTML2Text () h . ignore_links = ignore_link markdown_content = h . handle ( html_content ) return markdown_content except requests . RequestException as e : print ( f \"Error fetching the URL: { e } \" ) return f \"Error fetching the URL: { e } \"","title":"web_crawler"},{"location":"automator/web_automator/#command","text":"We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator. @WebReceiver.register class WebCrawlerCommand(WebCommand): \"\"\" The command to run the crawler with various options. \"\"\" def execute(self): \"\"\" Execute the command to run the crawler. :return: The result content. \"\"\" return self.receiver.web_crawler( url=self.params.get(\"url\"), ignore_link=self.params.get(\"ignore_link\", False), ) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"web_crawler\" Below is the list of available commands in the Web Automator that are currently supported by UFO: Command Name Function Name Description WebCrawlerCommand web_crawler Get the content of a web page into a markdown format. Tip Please refer to the ufo/prompts/apps/web/api.yaml file for the prompt details for the WebCrawlerCommand command.","title":"Command"},{"location":"automator/wincom_automator/","text":"API Automator UFO currently support the use of Win32 API API automator to interact with the application's native API. We implement them in python using the pywin32 library. The API automator now supports Word and Excel applications, and we are working on extending the support to other applications. Configuration There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only WINWORD.EXE and EXCEL.EXE are currently supported by the API Automator. Receiver The base class for the receiver of the API Automator is the WinCOMReceiverBasic class defined in the ufo/automator/app_apis/basic module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the WinCOMReceiverBasic class: Bases: ReceiverBasic The base class for Windows COM client. Initialize the Windows COM client. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. clsid ( str ) \u2013 The CLSID of the COM object. Source code in automator/app_apis/basic.py 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 def __init__ ( self , app_root_name : str , process_name : str , clsid : str ) -> None : \"\"\" Initialize the Windows COM client. :param app_root_name: The app root name. :param process_name: The process name. :param clsid: The CLSID of the COM object. \"\"\" self . app_root_name = app_root_name self . process_name = process_name self . clsid = clsid self . client = win32com . client . Dispatch ( self . clsid ) self . com_object = self . get_object_from_process_name () full_path : str property Get the full path of the process. Returns: str \u2013 The full path of the process. app_match ( object_name_list ) Check if the process name matches the app root. Parameters: object_name_list ( List [ str ] ) \u2013 The list of object name. Returns: str \u2013 The matched object name. Source code in automator/app_apis/basic.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def app_match ( self , object_name_list : List [ str ]) -> str : \"\"\" Check if the process name matches the app root. :param object_name_list: The list of object name. :return: The matched object name. \"\"\" suffix = self . get_suffix_mapping () if self . process_name . endswith ( suffix ): clean_process_name = self . process_name [: - len ( suffix )] else : clean_process_name = self . process_name if not object_name_list : return \"\" return max ( object_name_list , key = lambda x : self . longest_common_substring_length ( clean_process_name , x ), ) close () Close the app. Source code in automator/app_apis/basic.py 110 111 112 113 114 115 116 117 def close ( self ) -> None : \"\"\" Close the app. \"\"\" try : self . com_object . Close () except : pass get_object_from_process_name () abstractmethod Get the object from the process name. Source code in automator/app_apis/basic.py 36 37 38 39 40 41 @abstractmethod def get_object_from_process_name ( self ) -> win32com . client . CDispatch : \"\"\" Get the object from the process name. \"\"\" pass get_suffix_mapping () Get the suffix mapping. Returns: Dict [ str , str ] \u2013 The suffix mapping. Source code in automator/app_apis/basic.py 43 44 45 46 47 48 49 50 51 52 53 54 55 def get_suffix_mapping ( self ) -> Dict [ str , str ]: \"\"\" Get the suffix mapping. :return: The suffix mapping. \"\"\" suffix_mapping = { \"WINWORD.EXE\" : \"docx\" , \"EXCEL.EXE\" : \"xlsx\" , \"POWERPNT.EXE\" : \"pptx\" , \"olk.exe\" : \"msg\" , } return suffix_mapping . get ( self . app_root_name , None ) longest_common_substring_length ( str1 , str2 ) staticmethod Get the longest common substring of two strings. Parameters: str1 ( str ) \u2013 The first string. str2 ( str ) \u2013 The second string. Returns: int \u2013 The length of the longest common substring. Source code in automator/app_apis/basic.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 @staticmethod def longest_common_substring_length ( str1 : str , str2 : str ) -> int : \"\"\" Get the longest common substring of two strings. :param str1: The first string. :param str2: The second string. :return: The length of the longest common substring. \"\"\" m = len ( str1 ) n = len ( str2 ) dp = [[ 0 ] * ( n + 1 ) for _ in range ( m + 1 )] max_length = 0 for i in range ( 1 , m + 1 ): for j in range ( 1 , n + 1 ): if str1 [ i - 1 ] == str2 [ j - 1 ]: dp [ i ][ j ] = dp [ i - 1 ][ j - 1 ] + 1 if dp [ i ][ j ] > max_length : max_length = dp [ i ][ j ] else : dp [ i ][ j ] = 0 return max_length save () Save the current state of the app. Source code in automator/app_apis/basic.py 91 92 93 94 95 96 97 98 def save ( self ) -> None : \"\"\" Save the current state of the app. \"\"\" try : self . com_object . Save () except : pass save_to_xml ( file_path ) Save the current state of the app to XML. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/app_apis/basic.py 100 101 102 103 104 105 106 107 108 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. :param file_path: The file path to save the XML. \"\"\" try : self . com_object . SaveAs ( file_path , self . xml_format_code ) except : pass The receiver of Word and Excel applications inherit from the WinCOMReceiverBasic class. The WordReceiver and ExcelReceiver classes are defined in the ufo/automator/app_apis/word and ufo/automator/app_apis/excel modules, respectively: Command The command of the API Automator for the Word and Excel applications in located in the client module in the ufo/automator/app_apis/{app_name} folder inheriting from the WinCOMCommand class. It encapsulates the function and parameters required to execute the action. Below is an example of a WordCommand class that inherits from the SelectTextCommand class: @WordWinCOMReceiver.register class SelectTextCommand(WinCOMCommand): \"\"\" The command to select text. \"\"\" def execute(self): \"\"\" Execute the command to select text. :return: The selected text. \"\"\" return self.receiver.select_text(self.params.get(\"text\")) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"select_text\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a concrete WinCOMReceiver to be executed using the register decorator. Below is the list of available commands in the API Automator that are currently supported by UFO: Word API Commands Command Name Function Name Description InsertTableCommand insert_table Insert a table to a Word document. SelectTextCommand select_text Select the text in a Word document. SelectTableCommand select_table Select a table in a Word document. Excel API Commands Command Name Function Name Description GetSheetContentCommand get_sheet_content Get the content of a sheet in the Excel app. Table2MarkdownCommand table2markdown Convert the table content in a sheet of the Excel app to markdown format. InsertExcelTableCommand insert_excel_table Insert a table to the Excel sheet. Tip Please refer to the ufo/prompts/apps/{app_name}/api.yaml file for the prompt details for the commands. Tip You can customize the commands by adding new command classes to the ufo/automator/app_apis/{app_name}/ module.","title":"API Automator"},{"location":"automator/wincom_automator/#api-automator","text":"UFO currently support the use of Win32 API API automator to interact with the application's native API. We implement them in python using the pywin32 library. The API automator now supports Word and Excel applications, and we are working on extending the support to other applications.","title":"API Automator"},{"location":"automator/wincom_automator/#configuration","text":"There are several configurations that need to be set up before using the API Automator in the config_dev.yaml file. Below is the list of configurations related to the API Automator: Configuration Option Description Type Default Value USE_APIS Whether to allow the use of application APIs. Boolean True APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} Note Only WINWORD.EXE and EXCEL.EXE are currently supported by the API Automator.","title":"Configuration"},{"location":"automator/wincom_automator/#receiver","text":"The base class for the receiver of the API Automator is the WinCOMReceiverBasic class defined in the ufo/automator/app_apis/basic module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the WinCOMReceiverBasic class: Bases: ReceiverBasic The base class for Windows COM client. Initialize the Windows COM client. Parameters: app_root_name ( str ) \u2013 The app root name. process_name ( str ) \u2013 The process name. clsid ( str ) \u2013 The CLSID of the COM object. Source code in automator/app_apis/basic.py 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 def __init__ ( self , app_root_name : str , process_name : str , clsid : str ) -> None : \"\"\" Initialize the Windows COM client. :param app_root_name: The app root name. :param process_name: The process name. :param clsid: The CLSID of the COM object. \"\"\" self . app_root_name = app_root_name self . process_name = process_name self . clsid = clsid self . client = win32com . client . Dispatch ( self . clsid ) self . com_object = self . get_object_from_process_name ()","title":"Receiver"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.full_path","text":"Get the full path of the process. Returns: str \u2013 The full path of the process.","title":"full_path"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.app_match","text":"Check if the process name matches the app root. Parameters: object_name_list ( List [ str ] ) \u2013 The list of object name. Returns: str \u2013 The matched object name. Source code in automator/app_apis/basic.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 def app_match ( self , object_name_list : List [ str ]) -> str : \"\"\" Check if the process name matches the app root. :param object_name_list: The list of object name. :return: The matched object name. \"\"\" suffix = self . get_suffix_mapping () if self . process_name . endswith ( suffix ): clean_process_name = self . process_name [: - len ( suffix )] else : clean_process_name = self . process_name if not object_name_list : return \"\" return max ( object_name_list , key = lambda x : self . longest_common_substring_length ( clean_process_name , x ), )","title":"app_match"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.close","text":"Close the app. Source code in automator/app_apis/basic.py 110 111 112 113 114 115 116 117 def close ( self ) -> None : \"\"\" Close the app. \"\"\" try : self . com_object . Close () except : pass","title":"close"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.get_object_from_process_name","text":"Get the object from the process name. Source code in automator/app_apis/basic.py 36 37 38 39 40 41 @abstractmethod def get_object_from_process_name ( self ) -> win32com . client . CDispatch : \"\"\" Get the object from the process name. \"\"\" pass","title":"get_object_from_process_name"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.get_suffix_mapping","text":"Get the suffix mapping. Returns: Dict [ str , str ] \u2013 The suffix mapping. Source code in automator/app_apis/basic.py 43 44 45 46 47 48 49 50 51 52 53 54 55 def get_suffix_mapping ( self ) -> Dict [ str , str ]: \"\"\" Get the suffix mapping. :return: The suffix mapping. \"\"\" suffix_mapping = { \"WINWORD.EXE\" : \"docx\" , \"EXCEL.EXE\" : \"xlsx\" , \"POWERPNT.EXE\" : \"pptx\" , \"olk.exe\" : \"msg\" , } return suffix_mapping . get ( self . app_root_name , None )","title":"get_suffix_mapping"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.longest_common_substring_length","text":"Get the longest common substring of two strings. Parameters: str1 ( str ) \u2013 The first string. str2 ( str ) \u2013 The second string. Returns: int \u2013 The length of the longest common substring. Source code in automator/app_apis/basic.py 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 @staticmethod def longest_common_substring_length ( str1 : str , str2 : str ) -> int : \"\"\" Get the longest common substring of two strings. :param str1: The first string. :param str2: The second string. :return: The length of the longest common substring. \"\"\" m = len ( str1 ) n = len ( str2 ) dp = [[ 0 ] * ( n + 1 ) for _ in range ( m + 1 )] max_length = 0 for i in range ( 1 , m + 1 ): for j in range ( 1 , n + 1 ): if str1 [ i - 1 ] == str2 [ j - 1 ]: dp [ i ][ j ] = dp [ i - 1 ][ j - 1 ] + 1 if dp [ i ][ j ] > max_length : max_length = dp [ i ][ j ] else : dp [ i ][ j ] = 0 return max_length","title":"longest_common_substring_length"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.save","text":"Save the current state of the app. Source code in automator/app_apis/basic.py 91 92 93 94 95 96 97 98 def save ( self ) -> None : \"\"\" Save the current state of the app. \"\"\" try : self . com_object . Save () except : pass","title":"save"},{"location":"automator/wincom_automator/#automator.app_apis.basic.WinCOMReceiverBasic.save_to_xml","text":"Save the current state of the app to XML. Parameters: file_path ( str ) \u2013 The file path to save the XML. Source code in automator/app_apis/basic.py 100 101 102 103 104 105 106 107 108 def save_to_xml ( self , file_path : str ) -> None : \"\"\" Save the current state of the app to XML. :param file_path: The file path to save the XML. \"\"\" try : self . com_object . SaveAs ( file_path , self . xml_format_code ) except : pass The receiver of Word and Excel applications inherit from the WinCOMReceiverBasic class. The WordReceiver and ExcelReceiver classes are defined in the ufo/automator/app_apis/word and ufo/automator/app_apis/excel modules, respectively:","title":"save_to_xml"},{"location":"automator/wincom_automator/#command","text":"The command of the API Automator for the Word and Excel applications in located in the client module in the ufo/automator/app_apis/{app_name} folder inheriting from the WinCOMCommand class. It encapsulates the function and parameters required to execute the action. Below is an example of a WordCommand class that inherits from the SelectTextCommand class: @WordWinCOMReceiver.register class SelectTextCommand(WinCOMCommand): \"\"\" The command to select text. \"\"\" def execute(self): \"\"\" Execute the command to select text. :return: The selected text. \"\"\" return self.receiver.select_text(self.params.get(\"text\")) @classmethod def name(cls) -> str: \"\"\" The name of the command. \"\"\" return \"select_text\" Note The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command. Note Each command must register with a concrete WinCOMReceiver to be executed using the register decorator. Below is the list of available commands in the API Automator that are currently supported by UFO:","title":"Command"},{"location":"automator/wincom_automator/#word-api-commands","text":"Command Name Function Name Description InsertTableCommand insert_table Insert a table to a Word document. SelectTextCommand select_text Select the text in a Word document. SelectTableCommand select_table Select a table in a Word document.","title":"Word API Commands"},{"location":"automator/wincom_automator/#excel-api-commands","text":"Command Name Function Name Description GetSheetContentCommand get_sheet_content Get the content of a sheet in the Excel app. Table2MarkdownCommand table2markdown Convert the table content in a sheet of the Excel app to markdown format. InsertExcelTableCommand insert_excel_table Insert a table to the Excel sheet. Tip Please refer to the ufo/prompts/apps/{app_name}/api.yaml file for the prompt details for the commands. Tip You can customize the commands by adding new command classes to the ufo/automator/app_apis/{app_name}/ module.","title":"Excel API Commands"},{"location":"configurations/developer_configuration/","text":"Developer Configuration This section provides detailed information on how to configure the UFO agent for developers. The configuration file config_dev.yaml is located in the ufo/config directory and contains various settings and switches to customize the UFO agent for development purposes. System Configuration The following parameters are included in the system configuration of the UFO agent: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" MAX_STEP The maximum step limit for completing the user request in a session. Integer 100 SLEEP_TIME The sleep time in seconds between each step to wait for the window to be ready. Integer 5 RECTANGLE_TIME The time in seconds for the rectangle display around the selected control. Integer 1 SAFE_GUARD Whether to use the safe guard to ask for user confirmation before performing sensitive operations. Boolean True CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] HISTORY_KEYS The keys of the step history added to the Blackboard for agent decision-making. List [\"Step\", \"Thought\", \"ControlText\", \"Subtask\", \"Action\", \"Comment\", \"Results\", \"UserConfirm\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} PRINT_LOG Whether to print the log in the console. Boolean False CONCAT_SCREENSHOT Whether to concatenate the screenshots into a single image for the LLM input. Boolean False INCLUDE_LAST_SCREENSHOT Whether to include the screenshot from the last step in the observation. Boolean True LOG_LEVEL The log level for the UFO agent. String \"DEBUG\" REQUEST_TIMEOUT The call timeout in seconds for the LLM model. Integer 250 USE_APIS Whether to allow the use of application APIs. Boolean True LOG_XML Whether to log the XML file at every step. Boolean False SCREENSHOT_TO_MEMORY Whether to allow the screenshot to Blackboard for the agent's decision making. Boolean True SAVE_UI_TREE Whether to save the UI tree in the log. Boolean False Main Prompt Configuration Main Prompt Templates The main prompt templates include the prompts in the UFO agent for both system and user roles. Configuration Option Description Type Default Value HOSTAGENT_PROMPT The main prompt template for the HostAgent . String \"ufo/prompts/share/base/host_agent.yaml\" APPAGENT_PROMPT The main prompt template for the AppAgent . String \"ufo/prompts/share/base/app_agent.yaml\" FOLLOWERAGENT_PROMPT The main prompt template for the FollowerAgent . String \"ufo/prompts/share/base/app_agent.yaml\" EVALUATION_PROMPT The prompt template for the evaluation. String \"ufo/prompts/evaluation/evaluate.yaml\" Lite versions of the main prompt templates can be found in the ufo/prompts/share/lite directory to reduce the input size for specific token limits. Example Prompt Templates Example prompt templates are used for demonstration purposes in the UFO agent. Configuration Option Description Type Default Value HOSTAGENT_EXAMPLE_PROMPT The example prompt template for the HostAgent used for demonstration. String \"ufo/prompts/examples/{mode}/host_agent_example.yaml\" APPAGENT_EXAMPLE_PROMPT The example prompt template for the AppAgent used for demonstration. String \"ufo/prompts/examples/{mode}/app_agent_example.yaml\" Lite versions of the example prompt templates can be found in the ufo/prompts/examples/lite/{mode} directory to reduce the input size for demonstration purposes. Experience and Demonstration Learning These configuration parameters are used for experience and demonstration learning in the UFO agent. Configuration Option Description Type Default Value EXPERIENCE_PROMPT The prompt for self-experience learning. String \"ufo/prompts/experience/experience_summary.yaml\" EXPERIENCE_SAVED_PATH The path to save the experience learning data. String \"vectordb/experience/\" DEMONSTRATION_PROMPT The prompt for user demonstration learning. String \"ufo/prompts/demonstration/demonstration_summary.yaml\" DEMONSTRATION_SAVED_PATH The path to save the demonstration learning data. String \"vectordb/demonstration/\" Application API Configuration These prompt configuration parameters are used for the application and control APIs in the UFO agent. Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"} pywinauto Configuration The API configuration parameters are used for the pywinauto API in the UFO agent. Configuration Option Description Type Default Value CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False Control Filtering The control filtering configuration parameters are used for control filtering in the agent's observation. Configuration Option Description Type Default Value CONTROL_FILTER The control filter type, can be TEXT , SEMANTIC , or ICON . List [] CONTROL_FILTER_TOP_K_PLAN The control filter effect on top k plans from the agent. Integer 2 CONTROL_FILTER_TOP_K_SEMANTIC The control filter top k for semantic similarity. Integer 15 CONTROL_FILTER_TOP_K_ICON The control filter top k for icon similarity. Integer 15 CONTROL_FILTER_MODEL_SEMANTIC_NAME The control filter model name for semantic similarity. String \"all-MiniLM-L6-v2\" CONTROL_FILTER_MODEL_ICON_NAME The control filter model name for icon similarity. String \"clip-ViT-B-32\" Customizations The customization configuration parameters are used for customizations in the UFO agent. Configuration Option Description Type Default Value ASK_QUESTION Whether to ask the user for a question. Boolean True USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20 Evaluation The evaluation configuration parameters are used for the evaluation in the UFO agent. Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True You can customize the configuration parameters in the config_dev.yaml file to suit your development needs and enhance the functionality of the UFO agent.","title":"Developer Configuration"},{"location":"configurations/developer_configuration/#developer-configuration","text":"This section provides detailed information on how to configure the UFO agent for developers. The configuration file config_dev.yaml is located in the ufo/config directory and contains various settings and switches to customize the UFO agent for development purposes.","title":"Developer Configuration"},{"location":"configurations/developer_configuration/#system-configuration","text":"The following parameters are included in the system configuration of the UFO agent: Configuration Option Description Type Default Value CONTROL_BACKEND The backend for control action, currently supporting uia and win32 . String \"uia\" MAX_STEP The maximum step limit for completing the user request in a session. Integer 100 SLEEP_TIME The sleep time in seconds between each step to wait for the window to be ready. Integer 5 RECTANGLE_TIME The time in seconds for the rectangle display around the selected control. Integer 1 SAFE_GUARD Whether to use the safe guard to ask for user confirmation before performing sensitive operations. Boolean True CONTROL_LIST The list of widgets allowed to be selected. List [\"Button\", \"Edit\", \"TabItem\", \"Document\", \"ListItem\", \"MenuItem\", \"ScrollBar\", \"TreeItem\", \"Hyperlink\", \"ComboBox\", \"RadioButton\", \"DataItem\"] HISTORY_KEYS The keys of the step history added to the Blackboard for agent decision-making. List [\"Step\", \"Thought\", \"ControlText\", \"Subtask\", \"Action\", \"Comment\", \"Results\", \"UserConfirm\"] ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {\"Button\": \"#FFF68F\", \"Edit\": \"#A5F0B5\", \"TabItem\": \"#A5E7F0\", \"Document\": \"#FFD18A\", \"ListItem\": \"#D9C3FE\", \"MenuItem\": \"#E7FEC3\", \"ScrollBar\": \"#FEC3F8\", \"TreeItem\": \"#D6D6D6\", \"Hyperlink\": \"#91FFEB\", \"ComboBox\": \"#D8B6D4\"} PRINT_LOG Whether to print the log in the console. Boolean False CONCAT_SCREENSHOT Whether to concatenate the screenshots into a single image for the LLM input. Boolean False INCLUDE_LAST_SCREENSHOT Whether to include the screenshot from the last step in the observation. Boolean True LOG_LEVEL The log level for the UFO agent. String \"DEBUG\" REQUEST_TIMEOUT The call timeout in seconds for the LLM model. Integer 250 USE_APIS Whether to allow the use of application APIs. Boolean True LOG_XML Whether to log the XML file at every step. Boolean False SCREENSHOT_TO_MEMORY Whether to allow the screenshot to Blackboard for the agent's decision making. Boolean True SAVE_UI_TREE Whether to save the UI tree in the log. Boolean False","title":"System Configuration"},{"location":"configurations/developer_configuration/#main-prompt-configuration","text":"","title":"Main Prompt Configuration"},{"location":"configurations/developer_configuration/#main-prompt-templates","text":"The main prompt templates include the prompts in the UFO agent for both system and user roles. Configuration Option Description Type Default Value HOSTAGENT_PROMPT The main prompt template for the HostAgent . String \"ufo/prompts/share/base/host_agent.yaml\" APPAGENT_PROMPT The main prompt template for the AppAgent . String \"ufo/prompts/share/base/app_agent.yaml\" FOLLOWERAGENT_PROMPT The main prompt template for the FollowerAgent . String \"ufo/prompts/share/base/app_agent.yaml\" EVALUATION_PROMPT The prompt template for the evaluation. String \"ufo/prompts/evaluation/evaluate.yaml\" Lite versions of the main prompt templates can be found in the ufo/prompts/share/lite directory to reduce the input size for specific token limits.","title":"Main Prompt Templates"},{"location":"configurations/developer_configuration/#example-prompt-templates","text":"Example prompt templates are used for demonstration purposes in the UFO agent. Configuration Option Description Type Default Value HOSTAGENT_EXAMPLE_PROMPT The example prompt template for the HostAgent used for demonstration. String \"ufo/prompts/examples/{mode}/host_agent_example.yaml\" APPAGENT_EXAMPLE_PROMPT The example prompt template for the AppAgent used for demonstration. String \"ufo/prompts/examples/{mode}/app_agent_example.yaml\" Lite versions of the example prompt templates can be found in the ufo/prompts/examples/lite/{mode} directory to reduce the input size for demonstration purposes.","title":"Example Prompt Templates"},{"location":"configurations/developer_configuration/#experience-and-demonstration-learning","text":"These configuration parameters are used for experience and demonstration learning in the UFO agent. Configuration Option Description Type Default Value EXPERIENCE_PROMPT The prompt for self-experience learning. String \"ufo/prompts/experience/experience_summary.yaml\" EXPERIENCE_SAVED_PATH The path to save the experience learning data. String \"vectordb/experience/\" DEMONSTRATION_PROMPT The prompt for user demonstration learning. String \"ufo/prompts/demonstration/demonstration_summary.yaml\" DEMONSTRATION_SAVED_PATH The path to save the demonstration learning data. String \"vectordb/demonstration/\"","title":"Experience and Demonstration Learning"},{"location":"configurations/developer_configuration/#application-api-configuration","text":"These prompt configuration parameters are used for the application and control APIs in the UFO agent. Configuration Option Description Type Default Value API_PROMPT The prompt for the UI automation API. String \"ufo/prompts/share/base/api.yaml\" APP_API_PROMPT_ADDRESS The prompt address for the application API. Dict {\"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\"}","title":"Application API Configuration"},{"location":"configurations/developer_configuration/#pywinauto-configuration","text":"The API configuration parameters are used for the pywinauto API in the UFO agent. Configuration Option Description Type Default Value CLICK_API The API used for click action, can be click_input or click . String \"click_input\" INPUT_TEXT_API The API used for input text action, can be type_keys or set_text . String \"type_keys\" INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False","title":"pywinauto Configuration"},{"location":"configurations/developer_configuration/#control-filtering","text":"The control filtering configuration parameters are used for control filtering in the agent's observation. Configuration Option Description Type Default Value CONTROL_FILTER The control filter type, can be TEXT , SEMANTIC , or ICON . List [] CONTROL_FILTER_TOP_K_PLAN The control filter effect on top k plans from the agent. Integer 2 CONTROL_FILTER_TOP_K_SEMANTIC The control filter top k for semantic similarity. Integer 15 CONTROL_FILTER_TOP_K_ICON The control filter top k for icon similarity. Integer 15 CONTROL_FILTER_MODEL_SEMANTIC_NAME The control filter model name for semantic similarity. String \"all-MiniLM-L6-v2\" CONTROL_FILTER_MODEL_ICON_NAME The control filter model name for icon similarity. String \"clip-ViT-B-32\"","title":"Control Filtering"},{"location":"configurations/developer_configuration/#customizations","text":"The customization configuration parameters are used for customizations in the UFO agent. Configuration Option Description Type Default Value ASK_QUESTION Whether to ask the user for a question. Boolean True USE_CUSTOMIZATION Whether to enable the customization. Boolean True QA_PAIR_FILE The path for the historical QA pairs. String \"customization/historical_qa.txt\" QA_PAIR_NUM The number of QA pairs for the customization. Integer 20","title":"Customizations"},{"location":"configurations/developer_configuration/#evaluation","text":"The evaluation configuration parameters are used for the evaluation in the UFO agent. Configuration Option Description Type Default Value EVA_SESSION Whether to include the session in the evaluation. Boolean True EVA_ROUND Whether to include the round in the evaluation. Boolean False EVA_ALL_SCREENSHOTS Whether to include all the screenshots in the evaluation. Boolean True You can customize the configuration parameters in the config_dev.yaml file to suit your development needs and enhance the functionality of the UFO agent.","title":"Evaluation"},{"location":"configurations/pricing_configuration/","text":"Pricing Configuration We provide a configuration file pricing_config.yaml to calculate the pricing of the UFO agent using different LLM APIs. The pricing configuration file is located in the ufo/config directory. Note that the pricing configuration file is only used for reference and may not be up-to-date. Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information. You can also customize the pricing configuration file based on the configured model names and their respective input and output prices by adding or modifying the pricing information in the pricing_config.yaml file. Below is the default pricing configuration: # Prices in $ per 1000 tokens # Last updated: 2024-05-13 PRICES: { \"openai/gpt-4-0613\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-3.5-turbo-0613\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-vision-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-4-32k\": {\"input\": 0.06, \"output\": 0.12}, \"openai/gpt-4-turbo\": {\"input\":0.01,\"output\": 0.03}, \"openai/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"openai/gpt-4o-2024-05-13\": {\"input\": 0.005, \"output\": 0.015}, \"openai/gpt-3.5-turbo-0125\": {\"input\": 0.0005, \"output\": 0.0015}, \"openai/gpt-3.5-turbo-1106\": {\"input\": 0.001, \"output\": 0.002}, \"openai/gpt-3.5-turbo-instruct\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-3.5-turbo-16k-0613\": {\"input\": 0.003, \"output\": 0.004}, \"openai/whisper-1\": {\"input\": 0.006, \"output\": 0.006}, \"openai/tts-1\": {\"input\": 0.015, \"output\": 0.015}, \"openai/tts-hd-1\": {\"input\": 0.03, \"output\": 0.03}, \"openai/text-embedding-ada-002-v2\": {\"input\": 0.0001, \"output\": 0.0001}, \"openai/text-davinci:003\": {\"input\": 0.02, \"output\": 0.02}, \"openai/text-ada-001\": {\"input\": 0.0004, \"output\": 0.0004}, \"azure/gpt-35-turbo-20220309\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-20230613\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-16k-20230613\":{\"input\": 0.003, \"output\": 0.004}, \"azure/gpt-35-turbo-1106\":{\"input\": 0.001, \"output\": 0.002}, \"azure/gpt-4-20230321\":{\"input\": 0.03, \"output\": 0.06}, \"azure/gpt-4-32k-20230321\":{\"input\": 0.06, \"output\": 0.12}, \"azure/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-visual-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-turbo-20240409\": {\"input\":0.01,\"output\": 0.03}, \"azure/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"azure/gpt-4o-20240513\": {\"input\": 0.005, \"output\": 0.015}, \"qwen/qwen-vl-plus\": {\"input\": 0.008, \"output\": 0.008}, \"qwen/qwen-vl-max\": {\"input\": 0.02, \"output\": 0.02}, \"gemini/gemini-1.5-flash\": {\"input\": 0.00035, \"output\": 0.00105}, \"gemini/gemini-1.5-pro\": {\"input\": 0.0035, \"output\": 0.0105}, \"gemini/gemini-1.0-pro\": {\"input\": 0.0005, \"output\": 0.0015}, } Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.","title":"Model Pricing"},{"location":"configurations/pricing_configuration/#pricing-configuration","text":"We provide a configuration file pricing_config.yaml to calculate the pricing of the UFO agent using different LLM APIs. The pricing configuration file is located in the ufo/config directory. Note that the pricing configuration file is only used for reference and may not be up-to-date. Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information. You can also customize the pricing configuration file based on the configured model names and their respective input and output prices by adding or modifying the pricing information in the pricing_config.yaml file. Below is the default pricing configuration: # Prices in $ per 1000 tokens # Last updated: 2024-05-13 PRICES: { \"openai/gpt-4-0613\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-3.5-turbo-0613\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4-1106-vision-preview\": {\"input\": 0.01, \"output\": 0.03}, \"openai/gpt-4\": {\"input\": 0.03, \"output\": 0.06}, \"openai/gpt-4-32k\": {\"input\": 0.06, \"output\": 0.12}, \"openai/gpt-4-turbo\": {\"input\":0.01,\"output\": 0.03}, \"openai/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"openai/gpt-4o-2024-05-13\": {\"input\": 0.005, \"output\": 0.015}, \"openai/gpt-3.5-turbo-0125\": {\"input\": 0.0005, \"output\": 0.0015}, \"openai/gpt-3.5-turbo-1106\": {\"input\": 0.001, \"output\": 0.002}, \"openai/gpt-3.5-turbo-instruct\": {\"input\": 0.0015, \"output\": 0.002}, \"openai/gpt-3.5-turbo-16k-0613\": {\"input\": 0.003, \"output\": 0.004}, \"openai/whisper-1\": {\"input\": 0.006, \"output\": 0.006}, \"openai/tts-1\": {\"input\": 0.015, \"output\": 0.015}, \"openai/tts-hd-1\": {\"input\": 0.03, \"output\": 0.03}, \"openai/text-embedding-ada-002-v2\": {\"input\": 0.0001, \"output\": 0.0001}, \"openai/text-davinci:003\": {\"input\": 0.02, \"output\": 0.02}, \"openai/text-ada-001\": {\"input\": 0.0004, \"output\": 0.0004}, \"azure/gpt-35-turbo-20220309\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-20230613\":{\"input\": 0.0015, \"output\": 0.002}, \"azure/gpt-35-turbo-16k-20230613\":{\"input\": 0.003, \"output\": 0.004}, \"azure/gpt-35-turbo-1106\":{\"input\": 0.001, \"output\": 0.002}, \"azure/gpt-4-20230321\":{\"input\": 0.03, \"output\": 0.06}, \"azure/gpt-4-32k-20230321\":{\"input\": 0.06, \"output\": 0.12}, \"azure/gpt-4-1106-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-0125-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-visual-preview\": {\"input\": 0.01, \"output\": 0.03}, \"azure/gpt-4-turbo-20240409\": {\"input\":0.01,\"output\": 0.03}, \"azure/gpt-4o\": {\"input\": 0.005,\"output\": 0.015}, \"azure/gpt-4o-20240513\": {\"input\": 0.005, \"output\": 0.015}, \"qwen/qwen-vl-plus\": {\"input\": 0.008, \"output\": 0.008}, \"qwen/qwen-vl-max\": {\"input\": 0.02, \"output\": 0.02}, \"gemini/gemini-1.5-flash\": {\"input\": 0.00035, \"output\": 0.00105}, \"gemini/gemini-1.5-pro\": {\"input\": 0.0035, \"output\": 0.0105}, \"gemini/gemini-1.0-pro\": {\"input\": 0.0005, \"output\": 0.0015}, } Please refer to the official pricing documentation of the respective LLM API provider for the most accurate pricing information.","title":"Pricing Configuration"},{"location":"configurations/user_configuration/","text":"User Configuration An overview of the user configuration options available in UFO. You need to rename the config.yaml.template in the folder ufo/config to config.yaml to configure the LLMs and other custom settings. LLM Configuration You can configure the LLMs for the HOST_AGENT and APP_AGENT separately in the config.yaml file. The FollowerAgent and EvaluationAgent share the same LLM configuration as the APP_AGENT . Additionally, you can configure a backup LLM engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Below are the configuration options for the LLMs, using OpenAI and Azure OpenAI (AOAI) as examples. You can find the settings for other LLM API configurations and usage in the Supported Models section of the documentation. Configuration Option Description Type Default Value VISUAL_MODE Whether to use visual mode to understand screenshots and take actions Boolean True API_TYPE The API type: \"openai\" for the OpenAI API, \"aoai\" for the AOAI API. String \"openai\" API_BASE The API endpoint for the LLM String \"https://api.openai.com/v1/chat/completions\" API_KEY The API key for the LLM String \"sk-\" API_VERSION The version of the API String \"2024-02-15-preview\" API_MODEL The LLM model name String \"gpt-4-vision-preview\" For Azure OpenAI (AOAI) API The following additional configuration option is available for the AOAI API: Configuration Option Description Type Default Value API_DEPLOYMENT_ID The deployment ID, only available for the AOAI API String \"\" Ensure to fill in the necessary API details for both the HOST_AGENT and APP_AGENT to enable UFO to interact with the LLMs effectively. LLM Parameters You can also configure additional parameters for the LLMs in the config.yaml file: Configuration Option Description Type Default Value MAX_TOKENS The maximum token limit for the response completion Integer 2000 MAX_RETRY The maximum retry limit for the response completion Integer 3 TEMPERATURE The temperature of the model: the lower the value, the more consistent the output of the model Float 0.0 TOP_P The top_p of the model: the lower the value, the more conservative the output of the model Float 0.0 TIMEOUT The call timeout in seconds Integer 60 For RAG Configuration to Enhance the UFO Agent You can configure the RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources: RAG Configuration for the Offline Docs Configure the following parameters to allow UFO to use offline documents for the decision-making process: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1 RAG Configuration for the Bing search Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1 RAG Configuration for experience Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 RAG Configuration for demonstration Configure the following parameters to allow UFO to use the RAG from user demonstration: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use the RAG from its user demonstration Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3 Explore the various RAG configurations to enhance the UFO agent with additional knowledge sources and improve its decision-making capabilities.","title":"User Configuration"},{"location":"configurations/user_configuration/#user-configuration","text":"An overview of the user configuration options available in UFO. You need to rename the config.yaml.template in the folder ufo/config to config.yaml to configure the LLMs and other custom settings.","title":"User Configuration"},{"location":"configurations/user_configuration/#llm-configuration","text":"You can configure the LLMs for the HOST_AGENT and APP_AGENT separately in the config.yaml file. The FollowerAgent and EvaluationAgent share the same LLM configuration as the APP_AGENT . Additionally, you can configure a backup LLM engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Below are the configuration options for the LLMs, using OpenAI and Azure OpenAI (AOAI) as examples. You can find the settings for other LLM API configurations and usage in the Supported Models section of the documentation. Configuration Option Description Type Default Value VISUAL_MODE Whether to use visual mode to understand screenshots and take actions Boolean True API_TYPE The API type: \"openai\" for the OpenAI API, \"aoai\" for the AOAI API. String \"openai\" API_BASE The API endpoint for the LLM String \"https://api.openai.com/v1/chat/completions\" API_KEY The API key for the LLM String \"sk-\" API_VERSION The version of the API String \"2024-02-15-preview\" API_MODEL The LLM model name String \"gpt-4-vision-preview\"","title":"LLM Configuration"},{"location":"configurations/user_configuration/#for-azure-openai-aoai-api","text":"The following additional configuration option is available for the AOAI API: Configuration Option Description Type Default Value API_DEPLOYMENT_ID The deployment ID, only available for the AOAI API String \"\" Ensure to fill in the necessary API details for both the HOST_AGENT and APP_AGENT to enable UFO to interact with the LLMs effectively.","title":"For Azure OpenAI (AOAI) API"},{"location":"configurations/user_configuration/#llm-parameters","text":"You can also configure additional parameters for the LLMs in the config.yaml file: Configuration Option Description Type Default Value MAX_TOKENS The maximum token limit for the response completion Integer 2000 MAX_RETRY The maximum retry limit for the response completion Integer 3 TEMPERATURE The temperature of the model: the lower the value, the more consistent the output of the model Float 0.0 TOP_P The top_p of the model: the lower the value, the more conservative the output of the model Float 0.0 TIMEOUT The call timeout in seconds Integer 60","title":"LLM Parameters"},{"location":"configurations/user_configuration/#for-rag-configuration-to-enhance-the-ufo-agent","text":"You can configure the RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources:","title":"For RAG Configuration to Enhance the UFO Agent"},{"location":"configurations/user_configuration/#rag-configuration-for-the-offline-docs","text":"Configure the following parameters to allow UFO to use offline documents for the decision-making process: Configuration Option Description Type Default Value RAG_OFFLINE_DOCS Whether to use the offline RAG Boolean False RAG_OFFLINE_DOCS_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 1","title":"RAG Configuration for the Offline Docs"},{"location":"configurations/user_configuration/#rag-configuration-for-the-bing-search","text":"Configure the following parameters to allow UFO to use online Bing search for the decision-making process: Configuration Option Description Type Default Value RAG_ONLINE_SEARCH Whether to use the Bing search Boolean False BING_API_KEY The Bing search API key String \"\" RAG_ONLINE_SEARCH_TOPK The topk for the online search Integer 5 RAG_ONLINE_RETRIEVED_TOPK The topk for the online retrieved searched results Integer 1","title":"RAG Configuration for the Bing search"},{"location":"configurations/user_configuration/#rag-configuration-for-experience","text":"Configure the following parameters to allow UFO to use the RAG from its self-experience: Configuration Option Description Type Default Value RAG_EXPERIENCE Whether to use the RAG from its self-experience Boolean False RAG_EXPERIENCE_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5","title":"RAG Configuration for experience"},{"location":"configurations/user_configuration/#rag-configuration-for-demonstration","text":"Configure the following parameters to allow UFO to use the RAG from user demonstration: Configuration Option Description Type Default Value RAG_DEMONSTRATION Whether to use the RAG from its user demonstration Boolean False RAG_DEMONSTRATION_RETRIEVED_TOPK The topk for the offline retrieved documents Integer 5 RAG_DEMONSTRATION_COMPLETION_N The number of completion choices for the demonstration result Integer 3 Explore the various RAG configurations to enhance the UFO agent with additional knowledge sources and improve its decision-making capabilities.","title":"RAG Configuration for demonstration"},{"location":"creating_app_agent/demonstration_provision/","text":"Provide Human Demonstrations to the AppAgent Users or application developers can provide human demonstrations to the AppAgent to guide it in executing similar tasks in the future. The AppAgent uses these demonstrations to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application. How to Prepare Human Demonstrations for the AppAgent? Currently, UFO supports learning from user trajectories recorded by Steps Recorder integrated within Windows. More tools will be supported in the future. Step 1: Recording User Demonstrations Follow the official guidance to use Steps Recorder to record user demonstrations. Step 2: Add Additional Information or Comments as Needed Include any specific details or instructions for UFO to notice by adding comments. Since Steps Recorder doesn't capture typed text, include any necessary typed content in the comments as well. Step 3: Review and Save the Recorded Demonstrations Review the recorded steps and save them to a ZIP file. Refer to the sample_record.zip for an example of recorded steps for a specific request, such as \"sending an email to example@gmail.com to say hi.\" Step 4: Create an Action Trajectory Indexer Once you have your demonstration record ZIP file ready, you can parse it as an example to support RAG for UFO. Follow these steps: # Assume you are in the cloned UFO folder python -m record_processor -r \"\" -p \"\" Replace with the specific request, such as \"sending an email to example@gmail.com to say hi.\" Replace with the full path to the ZIP file you just created. This command will parse the record and summarize it into an execution plan. You'll see a confirmation message similar to the following: Here are the plans summarized from your demonstration: Plan [1] (1) Input the email address 'example@gmail.com' in the 'To' field. (2) Input the subject of the email. I need to input 'Greetings'. (3) Input the content of the email. I need to input 'Hello,\\nI hope this message finds you well. I am writing to send you a warm greeting and to wish you a great day.\\nBest regards.' (4) Click the Send button to send the email. Plan [2] (1) *** (2) *** (3) *** Plan [3] (1) *** (2) *** (3) *** Would you like to save any one of them as a future reference for the agent? Press [1] [2] [3] to save the corresponding plan, or press any other key to skip. Press 1 to save the plan into its memory for future reference. A sample can be found here . You can view a demonstration video below: How to Use Human Demonstrations to Enhance the AppAgent? After creating the offline indexer, refer to the Learning from User Demonstrations section for guidance on how to use human demonstrations to enhance the AppAgent.","title":"Demonstration Provision"},{"location":"creating_app_agent/demonstration_provision/#provide-human-demonstrations-to-the-appagent","text":"Users or application developers can provide human demonstrations to the AppAgent to guide it in executing similar tasks in the future. The AppAgent uses these demonstrations to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.","title":"Provide Human Demonstrations to the AppAgent"},{"location":"creating_app_agent/demonstration_provision/#how-to-prepare-human-demonstrations-for-the-appagent","text":"Currently, UFO supports learning from user trajectories recorded by Steps Recorder integrated within Windows. More tools will be supported in the future.","title":"How to Prepare Human Demonstrations for the AppAgent?"},{"location":"creating_app_agent/demonstration_provision/#step-1-recording-user-demonstrations","text":"Follow the official guidance to use Steps Recorder to record user demonstrations.","title":"Step 1: Recording User Demonstrations"},{"location":"creating_app_agent/demonstration_provision/#step-2-add-additional-information-or-comments-as-needed","text":"Include any specific details or instructions for UFO to notice by adding comments. Since Steps Recorder doesn't capture typed text, include any necessary typed content in the comments as well.","title":"Step 2: Add Additional Information or Comments as Needed"},{"location":"creating_app_agent/demonstration_provision/#step-3-review-and-save-the-recorded-demonstrations","text":"Review the recorded steps and save them to a ZIP file. Refer to the sample_record.zip for an example of recorded steps for a specific request, such as \"sending an email to example@gmail.com to say hi.\"","title":"Step 3: Review and Save the Recorded Demonstrations"},{"location":"creating_app_agent/demonstration_provision/#step-4-create-an-action-trajectory-indexer","text":"Once you have your demonstration record ZIP file ready, you can parse it as an example to support RAG for UFO. Follow these steps: # Assume you are in the cloned UFO folder python -m record_processor -r \"\" -p \"\" Replace with the specific request, such as \"sending an email to example@gmail.com to say hi.\" Replace with the full path to the ZIP file you just created. This command will parse the record and summarize it into an execution plan. You'll see a confirmation message similar to the following: Here are the plans summarized from your demonstration: Plan [1] (1) Input the email address 'example@gmail.com' in the 'To' field. (2) Input the subject of the email. I need to input 'Greetings'. (3) Input the content of the email. I need to input 'Hello,\\nI hope this message finds you well. I am writing to send you a warm greeting and to wish you a great day.\\nBest regards.' (4) Click the Send button to send the email. Plan [2] (1) *** (2) *** (3) *** Plan [3] (1) *** (2) *** (3) *** Would you like to save any one of them as a future reference for the agent? Press [1] [2] [3] to save the corresponding plan, or press any other key to skip. Press 1 to save the plan into its memory for future reference. A sample can be found here . You can view a demonstration video below:","title":"Step 4: Create an Action Trajectory Indexer"},{"location":"creating_app_agent/demonstration_provision/#how-to-use-human-demonstrations-to-enhance-the-appagent","text":"After creating the offline indexer, refer to the Learning from User Demonstrations section for guidance on how to use human demonstrations to enhance the AppAgent.","title":"How to Use Human Demonstrations to Enhance the AppAgent?"},{"location":"creating_app_agent/help_document_provision/","text":"Providing Help Documents to the AppAgent Help documents provide guidance to the AppAgent in executing specific tasks. The AppAgent uses these documents to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application. How to Provide Help Documents to the AppAgent? Step 1: Prepare Help Documents and Metadata Currently, UFO supports processing help documents in XML format, which is the default format for official help documents of Microsoft apps. More formats will be supported in the future. To create a dedicated document for a specific task of an app, save it in a file named, for example, task.xml . This document should be accompanied by a metadata file with the same prefix but with the .meta extension, such as task.xml.meta . The metadata file should include: title : Describes the task at a high level. Content-Summary : Summarizes the content of the help document. These two files are used for similarity search with user requests, so it is important to write them carefully. Examples of a help document and its metadata can be found here and here . Step 2: Place Help Documents in the AppAgent Directory Once you have prepared all help documents and their metadata, place them into a folder. Sub-folders for the help documents are allowed, but ensure that each help document and its corresponding metadata are placed in the same directory. Step 3: Create a Help Document Indexer After organizing your documents in a folder named path_of_the_docs , you can create an offline indexer to support RAG for UFO. Follow these steps: # Assume you are in the cloned UFO folder python -m learner --app --docs Replace with the name of the application, such as PowerPoint or WeChat. Replace with the full path to the folder containing all your documents. This command will create an offline indexer for all documents in the path_of_the_docs folder using Faiss and embedding with sentence transformer (additional embeddings will be supported soon). By default, the created index will be placed here . Note Ensure the app_name is accurately defined, as it is used to match the offline indexer in online RAG. How to Use Help Documents to Enhance the AppAgent? After creating the offline indexer, you can find the guidance on how to use the help documents to enhance the AppAgent in the Learning from Help Documents section.","title":"Help Document Provision"},{"location":"creating_app_agent/help_document_provision/#providing-help-documents-to-the-appagent","text":"Help documents provide guidance to the AppAgent in executing specific tasks. The AppAgent uses these documents to understand the context of the task and the steps required to execute it, effectively becoming an expert in the application.","title":"Providing Help Documents to the AppAgent"},{"location":"creating_app_agent/help_document_provision/#how-to-provide-help-documents-to-the-appagent","text":"","title":"How to Provide Help Documents to the AppAgent?"},{"location":"creating_app_agent/help_document_provision/#step-1-prepare-help-documents-and-metadata","text":"Currently, UFO supports processing help documents in XML format, which is the default format for official help documents of Microsoft apps. More formats will be supported in the future. To create a dedicated document for a specific task of an app, save it in a file named, for example, task.xml . This document should be accompanied by a metadata file with the same prefix but with the .meta extension, such as task.xml.meta . The metadata file should include: title : Describes the task at a high level. Content-Summary : Summarizes the content of the help document. These two files are used for similarity search with user requests, so it is important to write them carefully. Examples of a help document and its metadata can be found here and here .","title":"Step 1: Prepare Help Documents and Metadata"},{"location":"creating_app_agent/help_document_provision/#step-2-place-help-documents-in-the-appagent-directory","text":"Once you have prepared all help documents and their metadata, place them into a folder. Sub-folders for the help documents are allowed, but ensure that each help document and its corresponding metadata are placed in the same directory.","title":"Step 2: Place Help Documents in the AppAgent Directory"},{"location":"creating_app_agent/help_document_provision/#step-3-create-a-help-document-indexer","text":"After organizing your documents in a folder named path_of_the_docs , you can create an offline indexer to support RAG for UFO. Follow these steps: # Assume you are in the cloned UFO folder python -m learner --app --docs Replace with the name of the application, such as PowerPoint or WeChat. Replace with the full path to the folder containing all your documents. This command will create an offline indexer for all documents in the path_of_the_docs folder using Faiss and embedding with sentence transformer (additional embeddings will be supported soon). By default, the created index will be placed here . Note Ensure the app_name is accurately defined, as it is used to match the offline indexer in online RAG.","title":"Step 3: Create a Help Document Indexer"},{"location":"creating_app_agent/help_document_provision/#how-to-use-help-documents-to-enhance-the-appagent","text":"After creating the offline indexer, you can find the guidance on how to use the help documents to enhance the AppAgent in the Learning from Help Documents section.","title":"How to Use Help Documents to Enhance the AppAgent?"},{"location":"creating_app_agent/overview/","text":"Creating Your AppAgent UFO provides a flexible framework and SDK for application developers to empower their applications with AI capabilities by wrapping them into an AppAgent . By creating an AppAgent , you can leverage the power of UFO to interact with your application and automate tasks. To create an AppAgent , you can provide the following components: Component Description Usage Documentation Help Documents The help documents for the application to guide the AppAgent in executing tasks. Learning from Help Documents User Demonstrations The user demonstrations for the application to guide the AppAgent in executing tasks. Learning from User Demonstrations Native API Wrappers The native API wrappers for the application to interact with the application. Automator","title":"Overview"},{"location":"creating_app_agent/overview/#creating-your-appagent","text":"UFO provides a flexible framework and SDK for application developers to empower their applications with AI capabilities by wrapping them into an AppAgent . By creating an AppAgent , you can leverage the power of UFO to interact with your application and automate tasks. To create an AppAgent , you can provide the following components: Component Description Usage Documentation Help Documents The help documents for the application to guide the AppAgent in executing tasks. Learning from Help Documents User Demonstrations The user demonstrations for the application to guide the AppAgent in executing tasks. Learning from User Demonstrations Native API Wrappers The native API wrappers for the application to interact with the application. Automator","title":"Creating Your AppAgent"},{"location":"creating_app_agent/warpping_app_native_api/","text":"Wrapping Your App's Native API UFO takes actions on applications based on UI controls, but providing native API to its toolboxes can enhance the efficiency and accuracy of the actions. This document provides guidance on how to wrap your application's native API into UFO's toolboxes. How to Wrap Your App's Native API? Before developing the native API wrappers, we strongly recommend that you read the design of the Automator . Step 1: Create a Receiver for the Native API The Receiver is a class that receives the native API calls from the AppAgent and executes them. To wrap your application's native API, you need to create a Receiver class that contains the methods to execute the native API calls. To create a Receiver class, follow these steps: 1. Create a Folder for Your Application Navigate to the ufo/automator/app_api/ directory. Create a folder named after your application. 2. Create a Python File Inside the folder you just created, add a Python file named after your application, for example, {your_application}_client.py . 3. Define the Receiver Class In the Python file, define a class named {Your_Receiver} , inheriting from the ReceiverBasic class located in ufo/automator/basic.py . Initialize the Your_Receiver class with the object that executes the native API calls. For example, if your API is based on a com object, initialize the com object in the __init__ method of the Your_Receiver class. Example of WinCOMReceiverBasic class: class WinCOMReceiverBasic(ReceiverBasic): \"\"\" The base class for Windows COM client. \"\"\" _command_registry: Dict[str, Type[CommandBasic]] = {} def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None: \"\"\" Initialize the Windows COM client. :param app_root_name: The app root name. :param process_name: The process name. :param clsid: The CLSID of the COM object. \"\"\" self.app_root_name = app_root_name self.process_name = process_name self.clsid = clsid self.client = win32com.client.Dispatch(self.clsid) self.com_object = self.get_object_from_process_name() 4. Define Methods to Execute Native API Calls Define the methods in the Your_Receiver class to execute the native API calls. Example of ExcelWinCOMReceiver class: def table2markdown(self, sheet_name: str) -> str: \"\"\" Convert the table in the sheet to a markdown table string. :param sheet_name: The sheet name. :return: The markdown table string. \"\"\" sheet = self.com_object.Sheets(sheet_name) data = sheet.UsedRange() df = pd.DataFrame(data[1:], columns=data[0]) df = df.dropna(axis=0, how=\"all\") df = df.applymap(self.format_value) return df.to_markdown(index=False) 5. Create a Factory Class Create your Factory class inheriting from the APIReceiverFactory class to manage multiple Receiver classes that share the same API type. Implement the create_receiver and name methods in the ReceiverFactory class. The create_receiver method should return the Receiver class. By default, the create_receiver takes the app_root_name and process_name as parameters and returns the Receiver class. Register the ReceiverFactory class with the decorator @ReceiverManager.register . Example of the COMReceiverFactory class: from ufo.automator.puppeteer import ReceiverManager @ReceiverManager.register class COMReceiverFactory(APIReceiverFactory): \"\"\" The factory class for the COM receiver. \"\"\" def create_receiver(self, app_root_name: str, process_name: str) -> WinCOMReceiverBasic: \"\"\" Create the wincom receiver. :param app_root_name: The app root name. :param process_name: The process name. :return: The receiver. \"\"\" com_receiver = self.__com_client_mapper(app_root_name) clsid = self.__app_root_mappping(app_root_name) if clsid is None or com_receiver is None: # print_with_color(f\"Warning: Win32COM API is not supported for {process_name}.\", \"yellow\") return None return com_receiver(app_root_name, process_name, clsid) @classmethod def name(cls) -> str: \"\"\" Get the name of the receiver factory. :return: The name of the receiver factory. \"\"\" return \"COM\" Note The create_receiver method should return None if the application is not supported. Note You must register your ReceiverFactory with the decorator @ReceiverManager.register for the ReceiverManager to manage the ReceiverFactory . The Receiver class is now ready to receive the native API calls from the AppAgent . Step 2: Create a Command for the Native API Commands are the actions that the AppAgent can execute on the application. To create a command for the native API, you need to create a Command class that contains the method to execute the native API calls. 1. Create a Command Class Create a Command class in the same Python file where the Receiver class is located. The Command class should inherit from the CommandBasic class located in ufo/automator/basic.py . Example: class WinCOMCommand(CommandBasic): \"\"\" The abstract command interface. \"\"\" def __init__(self, receiver: WinCOMReceiverBasic, params=None) -> None: \"\"\" Initialize the command. :param receiver: The receiver of the command. \"\"\" self.receiver = receiver self.params = params if params is not None else {} @abstractmethod def execute(self): pass @classmethod def name(cls) -> str: \"\"\" Get the name of the command. :return: The name of the command. \"\"\" return cls.__name__ 2. Define the Execute Method Define the execute method in the Command class to call the receiver to execute the native API calls. Example: def execute(self): \"\"\" Execute the command to insert a table. :return: The inserted table. \"\"\" return self.receiver.insert_excel_table( sheet_name=self.params.get(\"sheet_name\", 1), table=self.params.get(\"table\"), start_row=self.params.get(\"start_row\", 1), start_col=self.params.get(\"start_col\", 1), ) 3. Register the Command Class: Register the Command class in the corresponding Receiver class using the @your_receiver.register decorator. Example: @ExcelWinCOMReceiver.register class InsertExcelTable(WinCOMCommand): ... The Command class is now registered in the Receiver class and available for the AppAgent to execute the native API calls. Step 3: Provide Prompt Descriptions for the Native API To let the AppAgent know the usage of the native API calls, you need to provide prompt descriptions. 1. Create an api.yaml File - Create an `api.yaml` file in the `ufo/prompts/apps/{your_app_name}` directory. 2. Define Prompt Descriptions Define the prompt descriptions for the native API calls in the api.yaml file. Example: table2markdown: summary: |- \"table2markdown\" is to get the table content in a sheet of the Excel app and convert it to markdown format. class_name: |- GetSheetContent usage: |- [1] API call: table2markdown(sheet_name: str) [2] Args: - sheet_name: The name of the sheet in the Excel app. [3] Example: table2markdown(sheet_name=\"Sheet1\") [4] Available control item: Any control item in the Excel app. [5] Return: the markdown format string of the table content of the sheet. Note The table2markdown is the name of the native API call. It MUST match the name() defined in the corresponding Command class! 3. Register the Prompt Address in config_dev.yaml Register the prompt address by adding to the APP_API_PROMPT_ADDRESS field of config_dev.yaml file with the application program name as the key and the prompt file address as the value. Example: APP_API_PROMPT_ADDRESS: { \"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\" \"your_application_program_name\": \"YOUR_APPLICATION_API_PROMPT\" } Note The your_application_program_name must match the name of the application program. The AppAgent can now use the prompt descriptions to understand the usage of the native API calls. By following these steps, you will have successfully wrapped the native API of your application into UFO's toolboxes, allowing the AppAgent to execute the native API calls on the application!","title":"Warpping App-Native API"},{"location":"creating_app_agent/warpping_app_native_api/#wrapping-your-apps-native-api","text":"UFO takes actions on applications based on UI controls, but providing native API to its toolboxes can enhance the efficiency and accuracy of the actions. This document provides guidance on how to wrap your application's native API into UFO's toolboxes.","title":"Wrapping Your App's Native API"},{"location":"creating_app_agent/warpping_app_native_api/#how-to-wrap-your-apps-native-api","text":"Before developing the native API wrappers, we strongly recommend that you read the design of the Automator .","title":"How to Wrap Your App's Native API?"},{"location":"creating_app_agent/warpping_app_native_api/#step-1-create-a-receiver-for-the-native-api","text":"The Receiver is a class that receives the native API calls from the AppAgent and executes them. To wrap your application's native API, you need to create a Receiver class that contains the methods to execute the native API calls. To create a Receiver class, follow these steps:","title":"Step 1: Create a Receiver for the Native API"},{"location":"creating_app_agent/warpping_app_native_api/#1-create-a-folder-for-your-application","text":"Navigate to the ufo/automator/app_api/ directory. Create a folder named after your application.","title":"1. Create a Folder for Your Application"},{"location":"creating_app_agent/warpping_app_native_api/#2-create-a-python-file","text":"Inside the folder you just created, add a Python file named after your application, for example, {your_application}_client.py .","title":"2. Create a Python File"},{"location":"creating_app_agent/warpping_app_native_api/#3-define-the-receiver-class","text":"In the Python file, define a class named {Your_Receiver} , inheriting from the ReceiverBasic class located in ufo/automator/basic.py . Initialize the Your_Receiver class with the object that executes the native API calls. For example, if your API is based on a com object, initialize the com object in the __init__ method of the Your_Receiver class. Example of WinCOMReceiverBasic class: class WinCOMReceiverBasic(ReceiverBasic): \"\"\" The base class for Windows COM client. \"\"\" _command_registry: Dict[str, Type[CommandBasic]] = {} def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None: \"\"\" Initialize the Windows COM client. :param app_root_name: The app root name. :param process_name: The process name. :param clsid: The CLSID of the COM object. \"\"\" self.app_root_name = app_root_name self.process_name = process_name self.clsid = clsid self.client = win32com.client.Dispatch(self.clsid) self.com_object = self.get_object_from_process_name()","title":"3. Define the Receiver Class"},{"location":"creating_app_agent/warpping_app_native_api/#4-define-methods-to-execute-native-api-calls","text":"Define the methods in the Your_Receiver class to execute the native API calls. Example of ExcelWinCOMReceiver class: def table2markdown(self, sheet_name: str) -> str: \"\"\" Convert the table in the sheet to a markdown table string. :param sheet_name: The sheet name. :return: The markdown table string. \"\"\" sheet = self.com_object.Sheets(sheet_name) data = sheet.UsedRange() df = pd.DataFrame(data[1:], columns=data[0]) df = df.dropna(axis=0, how=\"all\") df = df.applymap(self.format_value) return df.to_markdown(index=False)","title":"4. Define Methods to Execute Native API Calls"},{"location":"creating_app_agent/warpping_app_native_api/#5-create-a-factory-class","text":"Create your Factory class inheriting from the APIReceiverFactory class to manage multiple Receiver classes that share the same API type. Implement the create_receiver and name methods in the ReceiverFactory class. The create_receiver method should return the Receiver class. By default, the create_receiver takes the app_root_name and process_name as parameters and returns the Receiver class. Register the ReceiverFactory class with the decorator @ReceiverManager.register . Example of the COMReceiverFactory class: from ufo.automator.puppeteer import ReceiverManager @ReceiverManager.register class COMReceiverFactory(APIReceiverFactory): \"\"\" The factory class for the COM receiver. \"\"\" def create_receiver(self, app_root_name: str, process_name: str) -> WinCOMReceiverBasic: \"\"\" Create the wincom receiver. :param app_root_name: The app root name. :param process_name: The process name. :return: The receiver. \"\"\" com_receiver = self.__com_client_mapper(app_root_name) clsid = self.__app_root_mappping(app_root_name) if clsid is None or com_receiver is None: # print_with_color(f\"Warning: Win32COM API is not supported for {process_name}.\", \"yellow\") return None return com_receiver(app_root_name, process_name, clsid) @classmethod def name(cls) -> str: \"\"\" Get the name of the receiver factory. :return: The name of the receiver factory. \"\"\" return \"COM\" Note The create_receiver method should return None if the application is not supported. Note You must register your ReceiverFactory with the decorator @ReceiverManager.register for the ReceiverManager to manage the ReceiverFactory . The Receiver class is now ready to receive the native API calls from the AppAgent .","title":"5. Create a Factory Class"},{"location":"creating_app_agent/warpping_app_native_api/#step-2-create-a-command-for-the-native-api","text":"Commands are the actions that the AppAgent can execute on the application. To create a command for the native API, you need to create a Command class that contains the method to execute the native API calls.","title":"Step 2: Create a Command for the Native API"},{"location":"creating_app_agent/warpping_app_native_api/#1-create-a-command-class","text":"Create a Command class in the same Python file where the Receiver class is located. The Command class should inherit from the CommandBasic class located in ufo/automator/basic.py . Example: class WinCOMCommand(CommandBasic): \"\"\" The abstract command interface. \"\"\" def __init__(self, receiver: WinCOMReceiverBasic, params=None) -> None: \"\"\" Initialize the command. :param receiver: The receiver of the command. \"\"\" self.receiver = receiver self.params = params if params is not None else {} @abstractmethod def execute(self): pass @classmethod def name(cls) -> str: \"\"\" Get the name of the command. :return: The name of the command. \"\"\" return cls.__name__","title":"1. Create a Command Class"},{"location":"creating_app_agent/warpping_app_native_api/#2-define-the-execute-method","text":"Define the execute method in the Command class to call the receiver to execute the native API calls. Example: def execute(self): \"\"\" Execute the command to insert a table. :return: The inserted table. \"\"\" return self.receiver.insert_excel_table( sheet_name=self.params.get(\"sheet_name\", 1), table=self.params.get(\"table\"), start_row=self.params.get(\"start_row\", 1), start_col=self.params.get(\"start_col\", 1), ) 3. Register the Command Class: Register the Command class in the corresponding Receiver class using the @your_receiver.register decorator. Example: @ExcelWinCOMReceiver.register class InsertExcelTable(WinCOMCommand): ... The Command class is now registered in the Receiver class and available for the AppAgent to execute the native API calls.","title":"2. Define the Execute Method"},{"location":"creating_app_agent/warpping_app_native_api/#step-3-provide-prompt-descriptions-for-the-native-api","text":"To let the AppAgent know the usage of the native API calls, you need to provide prompt descriptions.","title":"Step 3: Provide Prompt Descriptions for the Native API"},{"location":"creating_app_agent/warpping_app_native_api/#1-create-an-apiyaml-file","text":"- Create an `api.yaml` file in the `ufo/prompts/apps/{your_app_name}` directory.","title":"1. Create an api.yaml File"},{"location":"creating_app_agent/warpping_app_native_api/#2-define-prompt-descriptions","text":"Define the prompt descriptions for the native API calls in the api.yaml file. Example: table2markdown: summary: |- \"table2markdown\" is to get the table content in a sheet of the Excel app and convert it to markdown format. class_name: |- GetSheetContent usage: |- [1] API call: table2markdown(sheet_name: str) [2] Args: - sheet_name: The name of the sheet in the Excel app. [3] Example: table2markdown(sheet_name=\"Sheet1\") [4] Available control item: Any control item in the Excel app. [5] Return: the markdown format string of the table content of the sheet. Note The table2markdown is the name of the native API call. It MUST match the name() defined in the corresponding Command class!","title":"2. Define Prompt Descriptions"},{"location":"creating_app_agent/warpping_app_native_api/#3-register-the-prompt-address-in-config_devyaml","text":"Register the prompt address by adding to the APP_API_PROMPT_ADDRESS field of config_dev.yaml file with the application program name as the key and the prompt file address as the value. Example: APP_API_PROMPT_ADDRESS: { \"WINWORD.EXE\": \"ufo/prompts/apps/word/api.yaml\", \"EXCEL.EXE\": \"ufo/prompts/apps/excel/api.yaml\", \"msedge.exe\": \"ufo/prompts/apps/web/api.yaml\", \"chrome.exe\": \"ufo/prompts/apps/web/api.yaml\" \"your_application_program_name\": \"YOUR_APPLICATION_API_PROMPT\" } Note The your_application_program_name must match the name of the application program. The AppAgent can now use the prompt descriptions to understand the usage of the native API calls. By following these steps, you will have successfully wrapped the native API of your application into UFO's toolboxes, allowing the AppAgent to execute the native API calls on the application!","title":"3. Register the Prompt Address in config_dev.yaml"},{"location":"dataflow/execution/","text":"Execution The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process. In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step. ExecuteFlow The ExecuteFlow class is designed to facilitate the execution and evaluation of tasks in a Windows application environment. It provides functionality to interact with the application's UI, execute predefined tasks, capture screenshots, and evaluate the results of the execution. The class also handles logging and error management for the tasks. Task Execution The task execution in the ExecuteFlow class follows a structured sequence to ensure accurate and traceable task performance: Initialization : Load configuration settings and log paths. Find the application window matching the task. Retrieve or create an ExecuteAgent for executing the task. Plan Execution : Loop through each step in the instantiated_plan . Parse the step to extract information like subtasks, control text, and the required operation. Action Execution : Find the control in the application window that matches the specified control text. If no matching control is found, raise an error. Perform the specified action (e.g., click, input text) using the agent's Puppeteer framework. Capture screenshots of the application window and selected controls for logging and debugging. Result Logging : Log details of the step execution, including control information, performed action, and results. Finalization : Save the final state of the application window. Quit the application client gracefully. Input of ExecuteAgent Parameter Type Description name str The name of the agent. Used for identification and logging purposes. process_name str The name of the application process that the agent interacts with. app_root_name str The name of the root application window or main UI component being targeted. --- Evaluation The evaluation process in the ExecuteFlow class is designed to assess the performance of the executed task based on predefined prompts: Start Evaluation : Evaluation begins immediately after task execution. It uses an ExecuteEvalAgent initialized during class construction. Perform Evaluation : The ExecuteEvalAgent evaluates the task using a combination of input prompts (e.g., main prompt and API prompt) and logs generated during task execution. The evaluation process outputs a result summary (e.g., quality flag, comments, and task type). Log and Output Results : Display the evaluation results in the console. Return the evaluation summary alongside the executed plan for further analysis or reporting. Reference ExecuteFlow Bases: AppAgentProcessor ExecuteFlow class for executing the task and saving the result. Initialize the execute flow for a task. Parameters: task_file_name ( str ) \u2013 Name of the task file being processed. context ( Context ) \u2013 Context object for the current session. environment ( WindowsAppEnv ) \u2013 Environment object for the application being processed. Source code in execution/workflow/execute_flow.py 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 def __init__ ( self , task_file_name : str , context : Context , environment : WindowsAppEnv ) -> None : \"\"\" Initialize the execute flow for a task. :param task_file_name: Name of the task file being processed. :param context: Context object for the current session. :param environment: Environment object for the application being processed. \"\"\" super () . __init__ ( agent = ExecuteAgent , context = context ) self . execution_time = None self . eval_time = None self . _app_env = environment self . _task_file_name = task_file_name self . _app_name = self . _app_env . app_name log_path = _configs [ \"EXECUTE_LOG_PATH\" ] . format ( task = task_file_name ) self . _initialize_logs ( log_path ) self . application_window = self . _app_env . find_matching_window ( task_file_name ) self . app_agent = self . _get_or_create_execute_agent () self . eval_agent = self . _get_or_create_evaluation_agent () self . _matched_control = None # Matched control for the current step. execute ( request , instantiated_plan ) Execute the execute flow: Execute the task and save the result. Parameters: request ( str ) \u2013 Original request to be executed. instantiated_plan ( List [ Dict [ str , Any ]] ) \u2013 Instantiated plan containing steps to execute. Returns: Tuple [ List [ Dict [ str , Any ]], Dict [ str , str ]] \u2013 Tuple containing task quality flag, comment, and task type. Source code in execution/workflow/execute_flow.py 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 def execute ( self , request : str , instantiated_plan : List [ Dict [ str , Any ]] ) -> Tuple [ List [ Dict [ str , Any ]], Dict [ str , str ]]: \"\"\" Execute the execute flow: Execute the task and save the result. :param request: Original request to be executed. :param instantiated_plan: Instantiated plan containing steps to execute. :return: Tuple containing task quality flag, comment, and task type. \"\"\" start_time = time . time () try : executed_plan = self . execute_plan ( instantiated_plan ) except Exception as error : raise RuntimeError ( f \"Execution failed. { error } \" ) finally : self . execution_time = round ( time . time () - start_time , 3 ) start_time = time . time () try : result , _ = self . eval_agent . evaluate ( request = request , log_path = self . log_path ) utils . print_with_color ( f \"Result: { result } \" , \"green\" ) except Exception as error : raise RuntimeError ( f \"Evaluation failed. { error } \" ) finally : self . eval_time = round ( time . time () - start_time , 3 ) return executed_plan , result execute_action () Execute the action. Source code in execution/workflow/execute_flow.py 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 def execute_action ( self ) -> None : \"\"\" Execute the action. \"\"\" control_selected = None # Find the matching window and control. self . application_window = self . _app_env . find_matching_window ( self . _task_file_name ) if self . control_text == \"\" : control_selected = self . application_window else : self . _control_label , control_selected = ( self . _app_env . find_matching_controller ( self . filtered_annotation_dict , self . control_text ) ) self . _matched_control = control_selected . window_text () if not control_selected : # If the control is not found, raise an error. raise RuntimeError ( f \"Control with text ' { self . control_text } ' not found.\" ) try : # Get the selected control item from the annotation dictionary and LLM response. # The LLM response is a number index corresponding to the key in the annotation dictionary. if control_selected : if _ufo_configs . get ( \"SHOW_VISUAL_OUTLINE_ON_SCREEN\" , True ): control_selected . draw_outline ( colour = \"red\" , thickness = 3 ) time . sleep ( _ufo_configs . get ( \"RECTANGLE_TIME\" , 0 )) control_coordinates = PhotographerDecorator . coordinate_adjusted ( self . application_window . rectangle (), control_selected . rectangle () ) self . _control_log = { \"control_class\" : control_selected . element_info . class_name , \"control_type\" : control_selected . element_info . control_type , \"control_automation_id\" : control_selected . element_info . automation_id , \"control_friendly_class_name\" : control_selected . friendly_class_name (), \"control_coordinates\" : { \"left\" : control_coordinates [ 0 ], \"top\" : control_coordinates [ 1 ], \"right\" : control_coordinates [ 2 ], \"bottom\" : control_coordinates [ 3 ], }, } self . app_agent . Puppeteer . receiver_manager . create_ui_control_receiver ( control_selected , self . application_window ) # Save the screenshot of the tagged selected control. self . capture_control_screenshot ( control_selected ) self . _results = self . app_agent . Puppeteer . execute_command ( self . _operation , self . _args ) self . control_reannotate = None if not utils . is_json_serializable ( self . _results ): self . _results = \"\" return except Exception : self . general_error_handler () execute_plan ( instantiated_plan ) Get the executed result from the execute agent. Parameters: instantiated_plan ( List [ Dict [ str , Any ]] ) \u2013 Plan containing steps to execute. Returns: List [ Dict [ str , Any ]] \u2013 List of executed steps. Source code in execution/workflow/execute_flow.py 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 def execute_plan ( self , instantiated_plan : List [ Dict [ str , Any ]] ) -> List [ Dict [ str , Any ]]: \"\"\" Get the executed result from the execute agent. :param instantiated_plan: Plan containing steps to execute. :return: List of executed steps. \"\"\" # Initialize the step counter and capture the initial screenshot. self . session_step = 0 try : time . sleep ( 1 ) # Initialize the API receiver self . app_agent . Puppeteer . receiver_manager . create_api_receiver ( self . app_agent . _app_root_name , self . app_agent . _process_name ) # Initialize the control receiver current_receiver = self . app_agent . Puppeteer . receiver_manager . receiver_list [ - 1 ] if current_receiver is not None : self . application_window = self . _app_env . find_matching_window ( self . _task_file_name ) current_receiver . com_object = ( current_receiver . get_object_from_process_name () ) self . init_and_final_capture_screenshot () except Exception as error : raise RuntimeError ( f \"Execution initialization failed. { error } \" ) # Initialize the success flag for each step. for index , step_plan in enumerate ( instantiated_plan ): instantiated_plan [ index ][ \"Success\" ] = None instantiated_plan [ index ][ \"MatchedControlText\" ] = None for index , step_plan in enumerate ( instantiated_plan ): try : self . session_step += 1 # Check if the maximum steps have been exceeded. if self . session_step > _configs [ \"MAX_STEPS\" ]: raise RuntimeError ( \"Maximum steps exceeded.\" ) self . _parse_step_plan ( step_plan ) try : self . process () instantiated_plan [ index ][ \"Success\" ] = True instantiated_plan [ index ][ \"ControlLabel\" ] = self . _control_label instantiated_plan [ index ][ \"MatchedControlText\" ] = self . _matched_control except Exception as ControllerNotFoundError : instantiated_plan [ index ][ \"Success\" ] = False raise ControllerNotFoundError except Exception as error : err_info = RuntimeError ( f \"Step { self . session_step } execution failed. { error } \" ) raise err_info # capture the final screenshot self . session_step += 1 time . sleep ( 1 ) self . init_and_final_capture_screenshot () # save the final state of the app win_com_receiver = None for receiver in reversed ( self . app_agent . Puppeteer . receiver_manager . receiver_list ): if isinstance ( receiver , WinCOMReceiverBasic ): if receiver . client is not None : win_com_receiver = receiver break if win_com_receiver is not None : win_com_receiver . save () time . sleep ( 1 ) win_com_receiver . client . Quit () print ( \"Execution complete.\" ) return instantiated_plan general_error_handler () Handle general errors. Source code in execution/workflow/execute_flow.py 375 376 377 378 379 380 def general_error_handler ( self ) -> None : \"\"\" Handle general errors. \"\"\" pass init_and_final_capture_screenshot () Capture the screenshot. Source code in execution/workflow/execute_flow.py 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 def init_and_final_capture_screenshot ( self ) -> None : \"\"\" Capture the screenshot. \"\"\" # Define the paths for the screenshots saved. screenshot_save_path = self . log_path + f \"action_step { self . session_step } .png\" self . _memory_data . add_values_from_dict ( { \"CleanScreenshot\" : screenshot_save_path , } ) self . photographer . capture_app_window_screenshot ( self . application_window , save_path = screenshot_save_path ) # Capture the control screenshot. control_selected = self . _app_env . app_window self . capture_control_screenshot ( control_selected ) log_save () Log the constructed prompt message for the PrefillAgent. Source code in execution/workflow/execute_flow.py 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def log_save ( self ) -> None : \"\"\" Log the constructed prompt message for the PrefillAgent. \"\"\" step_memory = { \"Step\" : self . session_step , \"Subtask\" : self . subtask , \"ControlLabel\" : self . _control_label , \"ControlText\" : self . control_text , \"Action\" : self . action , \"ActionType\" : self . app_agent . Puppeteer . get_command_types ( self . _operation ), \"Results\" : self . _results , \"Application\" : self . app_agent . _app_root_name , \"TimeCost\" : self . time_cost , } self . _memory_data . add_values_from_dict ( step_memory ) self . log ( self . _memory_data . to_dict ()) print_step_info () Print the step information. Source code in execution/workflow/execute_flow.py 233 234 235 236 237 238 239 240 241 242 243 244 def print_step_info ( self ) -> None : \"\"\" Print the step information. \"\"\" utils . print_with_color ( \"Step {step} : {subtask} \" . format ( step = self . session_step , subtask = self . subtask , ), \"magenta\" , ) process () Process the current step. Source code in execution/workflow/execute_flow.py 221 222 223 224 225 226 227 228 229 230 231 def process ( self ) -> None : \"\"\" Process the current step. \"\"\" step_start_time = time . time () self . print_step_info () self . capture_screenshot () self . execute_action () self . time_cost = round ( time . time () - step_start_time , 3 ) self . log_save () ExecuteAgent Bases: AppAgent The Agent for task execution. Initialize the ExecuteAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The name of the process. app_root_name ( str ) \u2013 The name of the app root. Source code in execution/agent/execute_agent.py 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 def __init__ ( self , name : str , process_name : str , app_root_name : str , ): \"\"\" Initialize the ExecuteAgent. :param name: The name of the agent. :param process_name: The name of the process. :param app_root_name: The name of the app root. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = None self . _process_name = process_name self . _app_root_name = app_root_name self . Puppeteer = self . create_puppeteer_interface () ExecuteEvalAgent Bases: EvaluationAgent The Agent for task execution evaluation. Initialize the ExecuteEvalAgent. Parameters: name ( str ) \u2013 The name of the agent. app_root_name ( str ) \u2013 The name of the app root. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Source code in execution/agent/execute_eval_agent.py 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 def __init__ ( self , name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the ExecuteEvalAgent. :param name: The name of the agent. :param app_root_name: The name of the app root. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. \"\"\" super () . __init__ ( name = name , app_root_name = app_root_name , is_visual = is_visual , main_prompt = main_prompt , example_prompt = example_prompt , api_prompt = api_prompt , ) get_prompter ( is_visual , prompt_template , example_prompt_template , api_prompt_template , root_name = None ) Get the prompter for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. prompt_template ( str ) \u2013 The prompt template. example_prompt_template ( str ) \u2013 The example prompt template. api_prompt_template ( str ) \u2013 The API prompt template. root_name ( Optional [ str ] , default: None ) \u2013 The name of the root. Returns: ExecuteEvalAgentPrompter \u2013 The prompter. Source code in execution/agent/execute_eval_agent.py 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 def get_prompter ( self , is_visual : bool , prompt_template : str , example_prompt_template : str , api_prompt_template : str , root_name : Optional [ str ] = None , ) -> ExecuteEvalAgentPrompter : \"\"\" Get the prompter for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param prompt_template: The prompt template. :param example_prompt_template: The example prompt template. :param api_prompt_template: The API prompt template. :param root_name: The name of the root. :return: The prompter. \"\"\" return ExecuteEvalAgentPrompter ( is_visual = is_visual , prompt_template = prompt_template , example_prompt_template = example_prompt_template , api_prompt_template = api_prompt_template , root_name = root_name , )","title":"Execution"},{"location":"dataflow/execution/#execution","text":"The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process. In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step.","title":"Execution"},{"location":"dataflow/execution/#executeflow","text":"The ExecuteFlow class is designed to facilitate the execution and evaluation of tasks in a Windows application environment. It provides functionality to interact with the application's UI, execute predefined tasks, capture screenshots, and evaluate the results of the execution. The class also handles logging and error management for the tasks.","title":"ExecuteFlow"},{"location":"dataflow/execution/#task-execution","text":"The task execution in the ExecuteFlow class follows a structured sequence to ensure accurate and traceable task performance: Initialization : Load configuration settings and log paths. Find the application window matching the task. Retrieve or create an ExecuteAgent for executing the task. Plan Execution : Loop through each step in the instantiated_plan . Parse the step to extract information like subtasks, control text, and the required operation. Action Execution : Find the control in the application window that matches the specified control text. If no matching control is found, raise an error. Perform the specified action (e.g., click, input text) using the agent's Puppeteer framework. Capture screenshots of the application window and selected controls for logging and debugging. Result Logging : Log details of the step execution, including control information, performed action, and results. Finalization : Save the final state of the application window. Quit the application client gracefully. Input of ExecuteAgent Parameter Type Description name str The name of the agent. Used for identification and logging purposes. process_name str The name of the application process that the agent interacts with. app_root_name str The name of the root application window or main UI component being targeted. ---","title":"Task Execution"},{"location":"dataflow/execution/#evaluation","text":"The evaluation process in the ExecuteFlow class is designed to assess the performance of the executed task based on predefined prompts: Start Evaluation : Evaluation begins immediately after task execution. It uses an ExecuteEvalAgent initialized during class construction. Perform Evaluation : The ExecuteEvalAgent evaluates the task using a combination of input prompts (e.g., main prompt and API prompt) and logs generated during task execution. The evaluation process outputs a result summary (e.g., quality flag, comments, and task type). Log and Output Results : Display the evaluation results in the console. Return the evaluation summary alongside the executed plan for further analysis or reporting.","title":"Evaluation"},{"location":"dataflow/execution/#reference","text":"","title":"Reference"},{"location":"dataflow/execution/#executeflow_1","text":"Bases: AppAgentProcessor ExecuteFlow class for executing the task and saving the result. Initialize the execute flow for a task. Parameters: task_file_name ( str ) \u2013 Name of the task file being processed. context ( Context ) \u2013 Context object for the current session. environment ( WindowsAppEnv ) \u2013 Environment object for the application being processed. Source code in execution/workflow/execute_flow.py 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 def __init__ ( self , task_file_name : str , context : Context , environment : WindowsAppEnv ) -> None : \"\"\" Initialize the execute flow for a task. :param task_file_name: Name of the task file being processed. :param context: Context object for the current session. :param environment: Environment object for the application being processed. \"\"\" super () . __init__ ( agent = ExecuteAgent , context = context ) self . execution_time = None self . eval_time = None self . _app_env = environment self . _task_file_name = task_file_name self . _app_name = self . _app_env . app_name log_path = _configs [ \"EXECUTE_LOG_PATH\" ] . format ( task = task_file_name ) self . _initialize_logs ( log_path ) self . application_window = self . _app_env . find_matching_window ( task_file_name ) self . app_agent = self . _get_or_create_execute_agent () self . eval_agent = self . _get_or_create_evaluation_agent () self . _matched_control = None # Matched control for the current step.","title":"ExecuteFlow"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.execute","text":"Execute the execute flow: Execute the task and save the result. Parameters: request ( str ) \u2013 Original request to be executed. instantiated_plan ( List [ Dict [ str , Any ]] ) \u2013 Instantiated plan containing steps to execute. Returns: Tuple [ List [ Dict [ str , Any ]], Dict [ str , str ]] \u2013 Tuple containing task quality flag, comment, and task type. Source code in execution/workflow/execute_flow.py 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 def execute ( self , request : str , instantiated_plan : List [ Dict [ str , Any ]] ) -> Tuple [ List [ Dict [ str , Any ]], Dict [ str , str ]]: \"\"\" Execute the execute flow: Execute the task and save the result. :param request: Original request to be executed. :param instantiated_plan: Instantiated plan containing steps to execute. :return: Tuple containing task quality flag, comment, and task type. \"\"\" start_time = time . time () try : executed_plan = self . execute_plan ( instantiated_plan ) except Exception as error : raise RuntimeError ( f \"Execution failed. { error } \" ) finally : self . execution_time = round ( time . time () - start_time , 3 ) start_time = time . time () try : result , _ = self . eval_agent . evaluate ( request = request , log_path = self . log_path ) utils . print_with_color ( f \"Result: { result } \" , \"green\" ) except Exception as error : raise RuntimeError ( f \"Evaluation failed. { error } \" ) finally : self . eval_time = round ( time . time () - start_time , 3 ) return executed_plan , result","title":"execute"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.execute_action","text":"Execute the action. Source code in execution/workflow/execute_flow.py 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 def execute_action ( self ) -> None : \"\"\" Execute the action. \"\"\" control_selected = None # Find the matching window and control. self . application_window = self . _app_env . find_matching_window ( self . _task_file_name ) if self . control_text == \"\" : control_selected = self . application_window else : self . _control_label , control_selected = ( self . _app_env . find_matching_controller ( self . filtered_annotation_dict , self . control_text ) ) self . _matched_control = control_selected . window_text () if not control_selected : # If the control is not found, raise an error. raise RuntimeError ( f \"Control with text ' { self . control_text } ' not found.\" ) try : # Get the selected control item from the annotation dictionary and LLM response. # The LLM response is a number index corresponding to the key in the annotation dictionary. if control_selected : if _ufo_configs . get ( \"SHOW_VISUAL_OUTLINE_ON_SCREEN\" , True ): control_selected . draw_outline ( colour = \"red\" , thickness = 3 ) time . sleep ( _ufo_configs . get ( \"RECTANGLE_TIME\" , 0 )) control_coordinates = PhotographerDecorator . coordinate_adjusted ( self . application_window . rectangle (), control_selected . rectangle () ) self . _control_log = { \"control_class\" : control_selected . element_info . class_name , \"control_type\" : control_selected . element_info . control_type , \"control_automation_id\" : control_selected . element_info . automation_id , \"control_friendly_class_name\" : control_selected . friendly_class_name (), \"control_coordinates\" : { \"left\" : control_coordinates [ 0 ], \"top\" : control_coordinates [ 1 ], \"right\" : control_coordinates [ 2 ], \"bottom\" : control_coordinates [ 3 ], }, } self . app_agent . Puppeteer . receiver_manager . create_ui_control_receiver ( control_selected , self . application_window ) # Save the screenshot of the tagged selected control. self . capture_control_screenshot ( control_selected ) self . _results = self . app_agent . Puppeteer . execute_command ( self . _operation , self . _args ) self . control_reannotate = None if not utils . is_json_serializable ( self . _results ): self . _results = \"\" return except Exception : self . general_error_handler ()","title":"execute_action"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.execute_plan","text":"Get the executed result from the execute agent. Parameters: instantiated_plan ( List [ Dict [ str , Any ]] ) \u2013 Plan containing steps to execute. Returns: List [ Dict [ str , Any ]] \u2013 List of executed steps. Source code in execution/workflow/execute_flow.py 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 def execute_plan ( self , instantiated_plan : List [ Dict [ str , Any ]] ) -> List [ Dict [ str , Any ]]: \"\"\" Get the executed result from the execute agent. :param instantiated_plan: Plan containing steps to execute. :return: List of executed steps. \"\"\" # Initialize the step counter and capture the initial screenshot. self . session_step = 0 try : time . sleep ( 1 ) # Initialize the API receiver self . app_agent . Puppeteer . receiver_manager . create_api_receiver ( self . app_agent . _app_root_name , self . app_agent . _process_name ) # Initialize the control receiver current_receiver = self . app_agent . Puppeteer . receiver_manager . receiver_list [ - 1 ] if current_receiver is not None : self . application_window = self . _app_env . find_matching_window ( self . _task_file_name ) current_receiver . com_object = ( current_receiver . get_object_from_process_name () ) self . init_and_final_capture_screenshot () except Exception as error : raise RuntimeError ( f \"Execution initialization failed. { error } \" ) # Initialize the success flag for each step. for index , step_plan in enumerate ( instantiated_plan ): instantiated_plan [ index ][ \"Success\" ] = None instantiated_plan [ index ][ \"MatchedControlText\" ] = None for index , step_plan in enumerate ( instantiated_plan ): try : self . session_step += 1 # Check if the maximum steps have been exceeded. if self . session_step > _configs [ \"MAX_STEPS\" ]: raise RuntimeError ( \"Maximum steps exceeded.\" ) self . _parse_step_plan ( step_plan ) try : self . process () instantiated_plan [ index ][ \"Success\" ] = True instantiated_plan [ index ][ \"ControlLabel\" ] = self . _control_label instantiated_plan [ index ][ \"MatchedControlText\" ] = self . _matched_control except Exception as ControllerNotFoundError : instantiated_plan [ index ][ \"Success\" ] = False raise ControllerNotFoundError except Exception as error : err_info = RuntimeError ( f \"Step { self . session_step } execution failed. { error } \" ) raise err_info # capture the final screenshot self . session_step += 1 time . sleep ( 1 ) self . init_and_final_capture_screenshot () # save the final state of the app win_com_receiver = None for receiver in reversed ( self . app_agent . Puppeteer . receiver_manager . receiver_list ): if isinstance ( receiver , WinCOMReceiverBasic ): if receiver . client is not None : win_com_receiver = receiver break if win_com_receiver is not None : win_com_receiver . save () time . sleep ( 1 ) win_com_receiver . client . Quit () print ( \"Execution complete.\" ) return instantiated_plan","title":"execute_plan"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.general_error_handler","text":"Handle general errors. Source code in execution/workflow/execute_flow.py 375 376 377 378 379 380 def general_error_handler ( self ) -> None : \"\"\" Handle general errors. \"\"\" pass","title":"general_error_handler"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.init_and_final_capture_screenshot","text":"Capture the screenshot. Source code in execution/workflow/execute_flow.py 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 def init_and_final_capture_screenshot ( self ) -> None : \"\"\" Capture the screenshot. \"\"\" # Define the paths for the screenshots saved. screenshot_save_path = self . log_path + f \"action_step { self . session_step } .png\" self . _memory_data . add_values_from_dict ( { \"CleanScreenshot\" : screenshot_save_path , } ) self . photographer . capture_app_window_screenshot ( self . application_window , save_path = screenshot_save_path ) # Capture the control screenshot. control_selected = self . _app_env . app_window self . capture_control_screenshot ( control_selected )","title":"init_and_final_capture_screenshot"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.log_save","text":"Log the constructed prompt message for the PrefillAgent. Source code in execution/workflow/execute_flow.py 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 def log_save ( self ) -> None : \"\"\" Log the constructed prompt message for the PrefillAgent. \"\"\" step_memory = { \"Step\" : self . session_step , \"Subtask\" : self . subtask , \"ControlLabel\" : self . _control_label , \"ControlText\" : self . control_text , \"Action\" : self . action , \"ActionType\" : self . app_agent . Puppeteer . get_command_types ( self . _operation ), \"Results\" : self . _results , \"Application\" : self . app_agent . _app_root_name , \"TimeCost\" : self . time_cost , } self . _memory_data . add_values_from_dict ( step_memory ) self . log ( self . _memory_data . to_dict ())","title":"log_save"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.print_step_info","text":"Print the step information. Source code in execution/workflow/execute_flow.py 233 234 235 236 237 238 239 240 241 242 243 244 def print_step_info ( self ) -> None : \"\"\" Print the step information. \"\"\" utils . print_with_color ( \"Step {step} : {subtask} \" . format ( step = self . session_step , subtask = self . subtask , ), \"magenta\" , )","title":"print_step_info"},{"location":"dataflow/execution/#execution.workflow.execute_flow.ExecuteFlow.process","text":"Process the current step. Source code in execution/workflow/execute_flow.py 221 222 223 224 225 226 227 228 229 230 231 def process ( self ) -> None : \"\"\" Process the current step. \"\"\" step_start_time = time . time () self . print_step_info () self . capture_screenshot () self . execute_action () self . time_cost = round ( time . time () - step_start_time , 3 ) self . log_save ()","title":"process"},{"location":"dataflow/execution/#executeagent","text":"Bases: AppAgent The Agent for task execution. Initialize the ExecuteAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The name of the process. app_root_name ( str ) \u2013 The name of the app root. Source code in execution/agent/execute_agent.py 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 def __init__ ( self , name : str , process_name : str , app_root_name : str , ): \"\"\" Initialize the ExecuteAgent. :param name: The name of the agent. :param process_name: The name of the process. :param app_root_name: The name of the app root. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = None self . _process_name = process_name self . _app_root_name = app_root_name self . Puppeteer = self . create_puppeteer_interface ()","title":"ExecuteAgent"},{"location":"dataflow/execution/#executeevalagent","text":"Bases: EvaluationAgent The Agent for task execution evaluation. Initialize the ExecuteEvalAgent. Parameters: name ( str ) \u2013 The name of the agent. app_root_name ( str ) \u2013 The name of the app root. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Source code in execution/agent/execute_eval_agent.py 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 def __init__ ( self , name : str , app_root_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the ExecuteEvalAgent. :param name: The name of the agent. :param app_root_name: The name of the app root. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. \"\"\" super () . __init__ ( name = name , app_root_name = app_root_name , is_visual = is_visual , main_prompt = main_prompt , example_prompt = example_prompt , api_prompt = api_prompt , )","title":"ExecuteEvalAgent"},{"location":"dataflow/execution/#execution.agent.execute_eval_agent.ExecuteEvalAgent.get_prompter","text":"Get the prompter for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. prompt_template ( str ) \u2013 The prompt template. example_prompt_template ( str ) \u2013 The example prompt template. api_prompt_template ( str ) \u2013 The API prompt template. root_name ( Optional [ str ] , default: None ) \u2013 The name of the root. Returns: ExecuteEvalAgentPrompter \u2013 The prompter. Source code in execution/agent/execute_eval_agent.py 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 def get_prompter ( self , is_visual : bool , prompt_template : str , example_prompt_template : str , api_prompt_template : str , root_name : Optional [ str ] = None , ) -> ExecuteEvalAgentPrompter : \"\"\" Get the prompter for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param prompt_template: The prompt template. :param example_prompt_template: The example prompt template. :param api_prompt_template: The API prompt template. :param root_name: The name of the root. :return: The prompter. \"\"\" return ExecuteEvalAgentPrompter ( is_visual = is_visual , prompt_template = prompt_template , example_prompt_template = example_prompt_template , api_prompt_template = api_prompt_template , root_name = root_name , )","title":"get_prompter"},{"location":"dataflow/instantiation/","text":"Instantiation There are three key steps in the instantiation process: Choose a template file according to the specified app and instruction. Prefill the task using the current screenshot. Filter the established task. Given the initial task, the dataflow first choose a template ( Phase 1 ), the prefill the initial task based on word envrionment to obtain task-action data ( Phase 2 ). Finnally, it will filter the established task to evaluate the quality of task-action data. 1. Choose Template File Templates for your app must be defined and described in dataflow/templates/app . For instance, if you want to instantiate tasks for the Word application, place the relevant .docx files in dataflow /templates/word , along with a description.json file. The appropriate template will be selected based on how well its description matches the instruction. The ChooseTemplateFlow uses semantic matching, where task descriptions are compared with template descriptions using embeddings and FAISS for efficient nearest neighbor search. If semantic matching fails, a random template is chosen from the available files. ChooseTemplateFlow Class to select and copy the most relevant template file based on the given task context. Initialize the flow with the given task context. Parameters: app_name ( str ) \u2013 The name of the application. file_extension ( str ) \u2013 The file extension of the template. task_file_name ( str ) \u2013 The name of the task file. Source code in instantiation/workflow/choose_template_flow.py 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 def __init__ ( self , app_name : str , task_file_name : str , file_extension : str ): \"\"\" Initialize the flow with the given task context. :param app_name: The name of the application. :param file_extension: The file extension of the template. :param task_file_name: The name of the task file. \"\"\" self . _app_name = app_name self . _file_extension = file_extension self . _task_file_name = task_file_name self . execution_time = None self . _embedding_model = self . _load_embedding_model ( model_name = _configs [ \"CONTROL_FILTER_MODEL_SEMANTIC_NAME\" ] ) execute () Execute the flow and return the copied template path. Returns: str \u2013 The path to the copied template file. Source code in instantiation/workflow/choose_template_flow.py 43 44 45 46 47 48 49 50 51 52 53 54 55 56 def execute ( self ) -> str : \"\"\" Execute the flow and return the copied template path. :return: The path to the copied template file. \"\"\" start_time = time . time () try : template_copied_path = self . _choose_template_and_copy () except Exception as e : raise e finally : self . execution_time = round ( time . time () - start_time , 3 ) return template_copied_path 2. Prefill the Task The PrefillFlow class orchestrates the refinement of task plans and UI interactions by leveraging PrefillAgent for task planning and action generation. It automates UI control updates, captures screenshots, and manages logs for messages and responses during execution. PrefillFlow Bases: AppAgentProcessor Class to manage the prefill process by refining planning steps and automating UI interactions Initialize the prefill flow with the application context. Parameters: app_name ( str ) \u2013 The name of the application. task_file_name ( str ) \u2013 The name of the task file for logging and tracking. environment ( WindowsAppEnv ) \u2013 The environment of the app. Source code in instantiation/workflow/prefill_flow.py 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 def __init__ ( self , app_name : str , task_file_name : str , environment : WindowsAppEnv , ) -> None : \"\"\" Initialize the prefill flow with the application context. :param app_name: The name of the application. :param task_file_name: The name of the task file for logging and tracking. :param environment: The environment of the app. \"\"\" self . execution_time = None self . _app_name = app_name self . _task_file_name = task_file_name self . _app_env = environment # Create or reuse a PrefillAgent for the app if self . _app_name not in PrefillFlow . _app_prefill_agent_dict : PrefillFlow . _app_prefill_agent_dict [ self . _app_name ] = PrefillAgent ( \"prefill\" , self . _app_name , is_visual = True , main_prompt = _configs [ \"PREFILL_PROMPT\" ], example_prompt = _configs [ \"PREFILL_EXAMPLE_PROMPT\" ], api_prompt = _configs [ \"API_PROMPT\" ], ) self . _prefill_agent = PrefillFlow . _app_prefill_agent_dict [ self . _app_name ] # Initialize execution step and UI control tools self . _execute_step = 0 self . _control_inspector = ControlInspectorFacade ( _BACKEND ) self . _photographer = PhotographerFacade () # Set default states self . _status = \"\" # Initialize loggers for messages and responses self . _log_path_configs = _configs [ \"PREFILL_LOG_PATH\" ] . format ( task = self . _task_file_name ) os . makedirs ( self . _log_path_configs , exist_ok = True ) # Set up loggers self . _message_logger = BaseSession . initialize_logger ( self . _log_path_configs , \"prefill_messages.json\" , \"w\" , _configs ) self . _response_logger = BaseSession . initialize_logger ( self . _log_path_configs , \"prefill_responses.json\" , \"w\" , _configs ) execute ( template_copied_path , original_task , refined_steps ) Start the execution by retrieving the instantiated result. Parameters: template_copied_path ( str ) \u2013 The path of the copied template to use. original_task ( str ) \u2013 The original task to refine. refined_steps ( List [ str ] ) \u2013 The steps to guide the refinement process. Returns: Dict [ str , Any ] \u2013 The refined task and corresponding action plans. Source code in instantiation/workflow/prefill_flow.py 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 def execute ( self , template_copied_path : str , original_task : str , refined_steps : List [ str ] ) -> Dict [ str , Any ]: \"\"\" Start the execution by retrieving the instantiated result. :param template_copied_path: The path of the copied template to use. :param original_task: The original task to refine. :param refined_steps: The steps to guide the refinement process. :return: The refined task and corresponding action plans. \"\"\" start_time = time . time () try : instantiated_request , instantiated_plan = self . _instantiate_task ( template_copied_path , original_task , refined_steps ) except Exception as e : raise e finally : self . execution_time = round ( time . time () - start_time , 3 ) return { \"instantiated_request\" : instantiated_request , \"instantiated_plan\" : instantiated_plan , } PrefillAgent The PrefillAgent class facilitates task instantiation and action sequence generation by constructing tailored prompt messages using the PrefillPrompter . It integrates system, user, and dynamic context to generate actionable inputs for automation workflows. Bases: BasicAgent The Agent for task instantialization and action sequence generation. Initialize the PrefillAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The name of the process. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Source code in instantiation/agent/prefill_agent.py 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 def __init__ ( self , name : str , process_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the PrefillAgent. :param name: The name of the agent. :param process_name: The name of the process. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = None self . prompter : PrefillPrompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . _process_name = process_name get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) Get the prompt for the agent. This is the abstract method from BasicAgent that needs to be implemented. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Returns: str \u2013 The prompt string. Source code in instantiation/agent/prefill_agent.py 44 45 46 47 48 49 50 51 52 53 54 55 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str ) -> str : \"\"\" Get the prompt for the agent. This is the abstract method from BasicAgent that needs to be implemented. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. :return: The prompt string. \"\"\" return PrefillPrompter ( is_visual , main_prompt , example_prompt , api_prompt ) message_constructor ( dynamic_examples , given_task , reference_steps , doc_control_state , log_path ) Construct the prompt message for the PrefillAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. given_task ( str ) \u2013 The given task. reference_steps ( List [ str ] ) \u2013 The reference steps. doc_control_state ( Dict [ str , str ] ) \u2013 The document control state. log_path ( str ) \u2013 The path of the log. Returns: List [ str ] \u2013 The prompt message. Source code in instantiation/agent/prefill_agent.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 def message_constructor ( self , dynamic_examples : str , given_task : str , reference_steps : List [ str ], doc_control_state : Dict [ str , str ], log_path : str , ) -> List [ str ]: \"\"\" Construct the prompt message for the PrefillAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param given_task: The given task. :param reference_steps: The reference steps. :param doc_control_state: The document control state. :param log_path: The path of the log. :return: The prompt message. \"\"\" prefill_agent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples ) prefill_agent_prompt_user_message = self . prompter . user_content_construction ( given_task , reference_steps , doc_control_state , log_path ) appagent_prompt_message = self . prompter . prompt_construction ( prefill_agent_prompt_system_message , prefill_agent_prompt_user_message , ) return appagent_prompt_message process_comfirmation () Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. Source code in instantiation/agent/prefill_agent.py 88 89 90 91 92 93 94 def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. \"\"\" pass 3. Filter Task The FilterFlow class is designed to process and refine task plans by leveraging a FilterAgent . FilterFlow Class to refine the plan steps and prefill the file based on filtering criteria. Initialize the filter flow for a task. Parameters: app_name ( str ) \u2013 Name of the application being processed. task_file_name ( str ) \u2013 Name of the task file being processed. Source code in instantiation/workflow/filter_flow.py 21 22 23 24 25 26 27 28 29 30 31 32 def __init__ ( self , app_name : str , task_file_name : str ) -> None : \"\"\" Initialize the filter flow for a task. :param app_name: Name of the application being processed. :param task_file_name: Name of the task file being processed. \"\"\" self . execution_time = None self . _app_name = app_name self . _log_path_configs = _configs [ \"FILTER_LOG_PATH\" ] . format ( task = task_file_name ) self . _filter_agent = self . _get_or_create_filter_agent () self . _initialize_logs () execute ( instantiated_request ) Execute the filter flow: Filter the task and save the result. Parameters: instantiated_request ( str ) \u2013 Request object to be filtered. Returns: Dict [ str , Any ] \u2013 Tuple containing task quality flag, comment, and task type. Source code in instantiation/workflow/filter_flow.py 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def execute ( self , instantiated_request : str ) -> Dict [ str , Any ]: \"\"\" Execute the filter flow: Filter the task and save the result. :param instantiated_request: Request object to be filtered. :return: Tuple containing task quality flag, comment, and task type. \"\"\" start_time = time . time () try : judge , thought , request_type = self . _get_filtered_result ( instantiated_request ) except Exception as e : raise e finally : self . execution_time = round ( time . time () - start_time , 3 ) return { \"judge\" : judge , \"thought\" : thought , \"request_type\" : request_type , } FilterAgent Bases: BasicAgent The Agent to evaluate the instantiated task is correct or not. Initialize the FilterAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The name of the process. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Source code in instantiation/agent/filter_agent.py 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 def __init__ ( self , name : str , process_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the FilterAgent. :param name: The name of the agent. :param process_name: The name of the process. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = None self . prompter : FilterPrompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . _process_name = process_name get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Returns: FilterPrompter \u2013 The prompt string. Source code in instantiation/agent/filter_agent.py 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str ) -> FilterPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. :return: The prompt string. \"\"\" return FilterPrompter ( is_visual , main_prompt , example_prompt , api_prompt ) message_constructor ( request , app ) Construct the prompt message for the FilterAgent. Parameters: request ( str ) \u2013 The request sentence. app ( str ) \u2013 The name of the operated app. Returns: List [ str ] \u2013 The prompt message. Source code in instantiation/agent/filter_agent.py 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 def message_constructor ( self , request : str , app : str ) -> List [ str ]: \"\"\" Construct the prompt message for the FilterAgent. :param request: The request sentence. :param app: The name of the operated app. :return: The prompt message. \"\"\" filter_agent_prompt_system_message = self . prompter . system_prompt_construction ( app = app ) filter_agent_prompt_user_message = self . prompter . user_content_construction ( request ) filter_agent_prompt_message = self . prompter . prompt_construction ( filter_agent_prompt_system_message , filter_agent_prompt_user_message ) return filter_agent_prompt_message process_comfirmation () Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. Source code in instantiation/agent/filter_agent.py 80 81 82 83 84 85 86 def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. \"\"\" pass","title":"Instantiation"},{"location":"dataflow/instantiation/#instantiation","text":"There are three key steps in the instantiation process: Choose a template file according to the specified app and instruction. Prefill the task using the current screenshot. Filter the established task. Given the initial task, the dataflow first choose a template ( Phase 1 ), the prefill the initial task based on word envrionment to obtain task-action data ( Phase 2 ). Finnally, it will filter the established task to evaluate the quality of task-action data.","title":"Instantiation"},{"location":"dataflow/instantiation/#1-choose-template-file","text":"Templates for your app must be defined and described in dataflow/templates/app . For instance, if you want to instantiate tasks for the Word application, place the relevant .docx files in dataflow /templates/word , along with a description.json file. The appropriate template will be selected based on how well its description matches the instruction. The ChooseTemplateFlow uses semantic matching, where task descriptions are compared with template descriptions using embeddings and FAISS for efficient nearest neighbor search. If semantic matching fails, a random template is chosen from the available files.","title":"1. Choose Template File"},{"location":"dataflow/instantiation/#choosetemplateflow","text":"Class to select and copy the most relevant template file based on the given task context. Initialize the flow with the given task context. Parameters: app_name ( str ) \u2013 The name of the application. file_extension ( str ) \u2013 The file extension of the template. task_file_name ( str ) \u2013 The name of the task file. Source code in instantiation/workflow/choose_template_flow.py 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 def __init__ ( self , app_name : str , task_file_name : str , file_extension : str ): \"\"\" Initialize the flow with the given task context. :param app_name: The name of the application. :param file_extension: The file extension of the template. :param task_file_name: The name of the task file. \"\"\" self . _app_name = app_name self . _file_extension = file_extension self . _task_file_name = task_file_name self . execution_time = None self . _embedding_model = self . _load_embedding_model ( model_name = _configs [ \"CONTROL_FILTER_MODEL_SEMANTIC_NAME\" ] )","title":"ChooseTemplateFlow"},{"location":"dataflow/instantiation/#instantiation.workflow.choose_template_flow.ChooseTemplateFlow.execute","text":"Execute the flow and return the copied template path. Returns: str \u2013 The path to the copied template file. Source code in instantiation/workflow/choose_template_flow.py 43 44 45 46 47 48 49 50 51 52 53 54 55 56 def execute ( self ) -> str : \"\"\" Execute the flow and return the copied template path. :return: The path to the copied template file. \"\"\" start_time = time . time () try : template_copied_path = self . _choose_template_and_copy () except Exception as e : raise e finally : self . execution_time = round ( time . time () - start_time , 3 ) return template_copied_path","title":"execute"},{"location":"dataflow/instantiation/#2-prefill-the-task","text":"The PrefillFlow class orchestrates the refinement of task plans and UI interactions by leveraging PrefillAgent for task planning and action generation. It automates UI control updates, captures screenshots, and manages logs for messages and responses during execution.","title":"2. Prefill the Task"},{"location":"dataflow/instantiation/#prefillflow","text":"Bases: AppAgentProcessor Class to manage the prefill process by refining planning steps and automating UI interactions Initialize the prefill flow with the application context. Parameters: app_name ( str ) \u2013 The name of the application. task_file_name ( str ) \u2013 The name of the task file for logging and tracking. environment ( WindowsAppEnv ) \u2013 The environment of the app. Source code in instantiation/workflow/prefill_flow.py 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 def __init__ ( self , app_name : str , task_file_name : str , environment : WindowsAppEnv , ) -> None : \"\"\" Initialize the prefill flow with the application context. :param app_name: The name of the application. :param task_file_name: The name of the task file for logging and tracking. :param environment: The environment of the app. \"\"\" self . execution_time = None self . _app_name = app_name self . _task_file_name = task_file_name self . _app_env = environment # Create or reuse a PrefillAgent for the app if self . _app_name not in PrefillFlow . _app_prefill_agent_dict : PrefillFlow . _app_prefill_agent_dict [ self . _app_name ] = PrefillAgent ( \"prefill\" , self . _app_name , is_visual = True , main_prompt = _configs [ \"PREFILL_PROMPT\" ], example_prompt = _configs [ \"PREFILL_EXAMPLE_PROMPT\" ], api_prompt = _configs [ \"API_PROMPT\" ], ) self . _prefill_agent = PrefillFlow . _app_prefill_agent_dict [ self . _app_name ] # Initialize execution step and UI control tools self . _execute_step = 0 self . _control_inspector = ControlInspectorFacade ( _BACKEND ) self . _photographer = PhotographerFacade () # Set default states self . _status = \"\" # Initialize loggers for messages and responses self . _log_path_configs = _configs [ \"PREFILL_LOG_PATH\" ] . format ( task = self . _task_file_name ) os . makedirs ( self . _log_path_configs , exist_ok = True ) # Set up loggers self . _message_logger = BaseSession . initialize_logger ( self . _log_path_configs , \"prefill_messages.json\" , \"w\" , _configs ) self . _response_logger = BaseSession . initialize_logger ( self . _log_path_configs , \"prefill_responses.json\" , \"w\" , _configs )","title":"PrefillFlow"},{"location":"dataflow/instantiation/#instantiation.workflow.prefill_flow.PrefillFlow.execute","text":"Start the execution by retrieving the instantiated result. Parameters: template_copied_path ( str ) \u2013 The path of the copied template to use. original_task ( str ) \u2013 The original task to refine. refined_steps ( List [ str ] ) \u2013 The steps to guide the refinement process. Returns: Dict [ str , Any ] \u2013 The refined task and corresponding action plans. Source code in instantiation/workflow/prefill_flow.py 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 def execute ( self , template_copied_path : str , original_task : str , refined_steps : List [ str ] ) -> Dict [ str , Any ]: \"\"\" Start the execution by retrieving the instantiated result. :param template_copied_path: The path of the copied template to use. :param original_task: The original task to refine. :param refined_steps: The steps to guide the refinement process. :return: The refined task and corresponding action plans. \"\"\" start_time = time . time () try : instantiated_request , instantiated_plan = self . _instantiate_task ( template_copied_path , original_task , refined_steps ) except Exception as e : raise e finally : self . execution_time = round ( time . time () - start_time , 3 ) return { \"instantiated_request\" : instantiated_request , \"instantiated_plan\" : instantiated_plan , }","title":"execute"},{"location":"dataflow/instantiation/#prefillagent","text":"The PrefillAgent class facilitates task instantiation and action sequence generation by constructing tailored prompt messages using the PrefillPrompter . It integrates system, user, and dynamic context to generate actionable inputs for automation workflows. Bases: BasicAgent The Agent for task instantialization and action sequence generation. Initialize the PrefillAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The name of the process. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Source code in instantiation/agent/prefill_agent.py 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 def __init__ ( self , name : str , process_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the PrefillAgent. :param name: The name of the agent. :param process_name: The name of the process. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = None self . prompter : PrefillPrompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . _process_name = process_name","title":"PrefillAgent"},{"location":"dataflow/instantiation/#instantiation.agent.prefill_agent.PrefillAgent.get_prompter","text":"Get the prompt for the agent. This is the abstract method from BasicAgent that needs to be implemented. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Returns: str \u2013 The prompt string. Source code in instantiation/agent/prefill_agent.py 44 45 46 47 48 49 50 51 52 53 54 55 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str ) -> str : \"\"\" Get the prompt for the agent. This is the abstract method from BasicAgent that needs to be implemented. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. :return: The prompt string. \"\"\" return PrefillPrompter ( is_visual , main_prompt , example_prompt , api_prompt )","title":"get_prompter"},{"location":"dataflow/instantiation/#instantiation.agent.prefill_agent.PrefillAgent.message_constructor","text":"Construct the prompt message for the PrefillAgent. Parameters: dynamic_examples ( str ) \u2013 The dynamic examples retrieved from the self-demonstration and human demonstration. given_task ( str ) \u2013 The given task. reference_steps ( List [ str ] ) \u2013 The reference steps. doc_control_state ( Dict [ str , str ] ) \u2013 The document control state. log_path ( str ) \u2013 The path of the log. Returns: List [ str ] \u2013 The prompt message. Source code in instantiation/agent/prefill_agent.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 def message_constructor ( self , dynamic_examples : str , given_task : str , reference_steps : List [ str ], doc_control_state : Dict [ str , str ], log_path : str , ) -> List [ str ]: \"\"\" Construct the prompt message for the PrefillAgent. :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration. :param given_task: The given task. :param reference_steps: The reference steps. :param doc_control_state: The document control state. :param log_path: The path of the log. :return: The prompt message. \"\"\" prefill_agent_prompt_system_message = self . prompter . system_prompt_construction ( dynamic_examples ) prefill_agent_prompt_user_message = self . prompter . user_content_construction ( given_task , reference_steps , doc_control_state , log_path ) appagent_prompt_message = self . prompter . prompt_construction ( prefill_agent_prompt_system_message , prefill_agent_prompt_user_message , ) return appagent_prompt_message","title":"message_constructor"},{"location":"dataflow/instantiation/#instantiation.agent.prefill_agent.PrefillAgent.process_comfirmation","text":"Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. Source code in instantiation/agent/prefill_agent.py 88 89 90 91 92 93 94 def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. \"\"\" pass","title":"process_comfirmation"},{"location":"dataflow/instantiation/#3-filter-task","text":"The FilterFlow class is designed to process and refine task plans by leveraging a FilterAgent .","title":"3. Filter Task"},{"location":"dataflow/instantiation/#filterflow","text":"Class to refine the plan steps and prefill the file based on filtering criteria. Initialize the filter flow for a task. Parameters: app_name ( str ) \u2013 Name of the application being processed. task_file_name ( str ) \u2013 Name of the task file being processed. Source code in instantiation/workflow/filter_flow.py 21 22 23 24 25 26 27 28 29 30 31 32 def __init__ ( self , app_name : str , task_file_name : str ) -> None : \"\"\" Initialize the filter flow for a task. :param app_name: Name of the application being processed. :param task_file_name: Name of the task file being processed. \"\"\" self . execution_time = None self . _app_name = app_name self . _log_path_configs = _configs [ \"FILTER_LOG_PATH\" ] . format ( task = task_file_name ) self . _filter_agent = self . _get_or_create_filter_agent () self . _initialize_logs ()","title":"FilterFlow"},{"location":"dataflow/instantiation/#instantiation.workflow.filter_flow.FilterFlow.execute","text":"Execute the filter flow: Filter the task and save the result. Parameters: instantiated_request ( str ) \u2013 Request object to be filtered. Returns: Dict [ str , Any ] \u2013 Tuple containing task quality flag, comment, and task type. Source code in instantiation/workflow/filter_flow.py 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 def execute ( self , instantiated_request : str ) -> Dict [ str , Any ]: \"\"\" Execute the filter flow: Filter the task and save the result. :param instantiated_request: Request object to be filtered. :return: Tuple containing task quality flag, comment, and task type. \"\"\" start_time = time . time () try : judge , thought , request_type = self . _get_filtered_result ( instantiated_request ) except Exception as e : raise e finally : self . execution_time = round ( time . time () - start_time , 3 ) return { \"judge\" : judge , \"thought\" : thought , \"request_type\" : request_type , }","title":"execute"},{"location":"dataflow/instantiation/#filteragent","text":"Bases: BasicAgent The Agent to evaluate the instantiated task is correct or not. Initialize the FilterAgent. Parameters: name ( str ) \u2013 The name of the agent. process_name ( str ) \u2013 The name of the process. is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Source code in instantiation/agent/filter_agent.py 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 def __init__ ( self , name : str , process_name : str , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str , ): \"\"\" Initialize the FilterAgent. :param name: The name of the agent. :param process_name: The name of the process. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. \"\"\" self . _step = 0 self . _complete = False self . _name = name self . _status = None self . prompter : FilterPrompter = self . get_prompter ( is_visual , main_prompt , example_prompt , api_prompt ) self . _process_name = process_name","title":"FilterAgent"},{"location":"dataflow/instantiation/#instantiation.agent.filter_agent.FilterAgent.get_prompter","text":"Get the prompt for the agent. Parameters: is_visual ( bool ) \u2013 The flag indicating whether the agent is visual or not. main_prompt ( str ) \u2013 The main prompt. example_prompt ( str ) \u2013 The example prompt. api_prompt ( str ) \u2013 The API prompt. Returns: FilterPrompter \u2013 The prompt string. Source code in instantiation/agent/filter_agent.py 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 def get_prompter ( self , is_visual : bool , main_prompt : str , example_prompt : str , api_prompt : str ) -> FilterPrompter : \"\"\" Get the prompt for the agent. :param is_visual: The flag indicating whether the agent is visual or not. :param main_prompt: The main prompt. :param example_prompt: The example prompt. :param api_prompt: The API prompt. :return: The prompt string. \"\"\" return FilterPrompter ( is_visual , main_prompt , example_prompt , api_prompt )","title":"get_prompter"},{"location":"dataflow/instantiation/#instantiation.agent.filter_agent.FilterAgent.message_constructor","text":"Construct the prompt message for the FilterAgent. Parameters: request ( str ) \u2013 The request sentence. app ( str ) \u2013 The name of the operated app. Returns: List [ str ] \u2013 The prompt message. Source code in instantiation/agent/filter_agent.py 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 def message_constructor ( self , request : str , app : str ) -> List [ str ]: \"\"\" Construct the prompt message for the FilterAgent. :param request: The request sentence. :param app: The name of the operated app. :return: The prompt message. \"\"\" filter_agent_prompt_system_message = self . prompter . system_prompt_construction ( app = app ) filter_agent_prompt_user_message = self . prompter . user_content_construction ( request ) filter_agent_prompt_message = self . prompter . prompt_construction ( filter_agent_prompt_system_message , filter_agent_prompt_user_message ) return filter_agent_prompt_message","title":"message_constructor"},{"location":"dataflow/instantiation/#instantiation.agent.filter_agent.FilterAgent.process_comfirmation","text":"Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. Source code in instantiation/agent/filter_agent.py 80 81 82 83 84 85 86 def process_comfirmation ( self ) -> None : \"\"\" Confirm the process. This is the abstract method from BasicAgent that needs to be implemented. \"\"\" pass","title":"process_comfirmation"},{"location":"dataflow/overview/","text":"Dataflow Dataflow uses UFO to implement instantiation , execution , and dataflow for a given task, with options for batch processing and single processing. Instantiation : Instantiation refers to the process of setting up and preparing a task for execution. This step typically involves choosing template , prefill and filter . Execution : Execution is the actual process of running the task. This step involves carrying out the actions or operations specified by the Instantiation . And after execution, an evaluate agent will evaluate the quality of the whole execution process. Dataflow : Dataflow is the overarching process that combines instantiation and execution into a single pipeline. It provides an end-to-end solution for processing tasks, ensuring that all necessary steps (from initialization to execution) are seamlessly integrated. You can use instantiation and execution independently if you only need to perform one specific part of the process. When both steps are required for a task, the dataflow process streamlines them, allowing you to execute tasks from start to finish in a single pipeline. The overall processing of dataflow is as below. Given a task-plan data, the LLMwill instantiatie the task-action data, including choosing template, prefill, filter. How To Use 1. Install Packages You should install the necessary packages in the UFO root folder: pip install -r requirements.txt 2. Configure the LLMs Before running dataflow, you need to provide your LLM configurations individually for PrefillAgent and FilterAgent . You can create your own config file dataflow/config/config.yaml , by copying the dataflow/config/config.yaml.template and editing config for PREFILL_AGENT and FILTER_AGENT as follows: OpenAI VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"openai\" , # The API type, \"openai\" for the OpenAI API. API_BASE: \"https://api.openai.com/v1/chat/completions\", # The the OpenAI API endpoint. API_KEY: \"sk-\", # The OpenAI API key, begin with sk- API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The only OpenAI model Azure OpenAI (AOAI) VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"aoai\" , # The API type, \"aoai\" for the Azure OpenAI. API_BASE: \"YOUR_ENDPOINT\", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com API_KEY: \"YOUR_KEY\", # The aoai API key API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The only OpenAI model API_DEPLOYMENT_ID: \"YOUR_AOAI_DEPLOYMENT\", # The deployment id for the AOAI API You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False and proper API_MODEL (openai) and API_DEPLOYMENT_ID (aoai). Non-Visual Model Configuration You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file: VISUAL_MODE: False # To enable non-visual mode. Specify the appropriate API_MODEL (OpenAI) and API_DEPLOYMENT_ID (AOAI) for each agent. Ensure you configure these settings accurately to leverage non-visual models effectively. Other Configurations config_dev.yaml specifies the paths of relevant files and contains default settings. The match strategy for the window match and control filter supports options: 'contains' , 'fuzzy' , and 'regex' , allowing flexible matching strategy for users. The MAX_STEPS is the max step for the execute_flow, which can be set by users. Note The specific implementation and invocation method of the matching strategy can refer to windows_app_env . Note BE CAREFUL! If you are using GitHub or other open-source tools, do not expose your config.yaml online, as it contains your private keys. 3. Prepare Files Certain files need to be prepared before running the task. 3.1. Tasks as JSON The tasks that need to be instantiated should be organized in a folder of JSON files, with the default folder path set to dataflow /tasks . This path can be changed in the dataflow/config/config.yaml file, or you can specify it in the terminal, as mentioned in 4. Start Running . For example, a task stored in dataflow/tasks/prefill/ may look like this: { // The app you want to use \"app\": \"word\", // A unique ID to distinguish different tasks \"unique_id\": \"1\", // The task and steps to be instantiated \"task\": \"Type 'hello' and set the font type to Arial\", \"refined_steps\": [ \"Type 'hello'\", \"Set the font to Arial\" ] } 3.2. Templates and Descriptions You should place an app file as a reference for instantiation in a folder named after the app. For example, if you have template1.docx for Word, it should be located at dataflow/templates/word/template1.docx . Additionally, for each app folder, there should be a description.json file located at dataflow/templates/word/description.json , which describes each template file in detail. It may look like this: { \"template1.docx\": \"A document with a rectangle shape\", \"template2.docx\": \"A document with a line of text\" } If a description.json file is not present, one template file will be selected at random. 3.3. Final Structure Ensure the following files are in place: JSON files to be instantiated Templates as references for instantiation Description file in JSON format The structure of the files can be: dataflow/ | \u251c\u2500\u2500 tasks \u2502 \u2514\u2500\u2500 prefill \u2502 \u251c\u2500\u2500 bulleted.json \u2502 \u251c\u2500\u2500 delete.json \u2502 \u251c\u2500\u2500 draw.json \u2502 \u251c\u2500\u2500 macro.json \u2502 \u2514\u2500\u2500 rotate.json \u251c\u2500\u2500 templates \u2502 \u2514\u2500\u2500 word \u2502 \u251c\u2500\u2500 description.json \u2502 \u251c\u2500\u2500 template1.docx \u2502 \u251c\u2500\u2500 template2.docx \u2502 \u251c\u2500\u2500 template3.docx \u2502 \u251c\u2500\u2500 template4.docx \u2502 \u251c\u2500\u2500 template5.docx \u2502 \u251c\u2500\u2500 template6.docx \u2502 \u2514\u2500\u2500 template7.docx \u2514\u2500\u2500 ... 4. Start Running After finishing the previous steps, you can use the following commands in the command line. We provide single / batch process, for which you need to give the single file path / folder path. Determine the type of path provided by the user and automatically decide whether to process a single task or batch tasks. Also, you can choose to use instantiation / execution sections individually, or use them as a whole section, which is named as dataflow . The default task hub is set to be \"TASKS_HUB\" in dataflow/config_dev.yaml . Dataflow Task: python -m dataflow -dataflow --task_path path_to_task_file Instantiation Task: python -m dataflow -instantiation --task_path path_to_task_file Execution Task: python -m dataflow -execution --task_path path_to_task_file Workflow Instantiation There are three key steps in the instantiation process: Choose a template file according to the specified app and instruction. Prefill the task using the current screenshot. Filter the established task. Given the initial task, the dataflow first choose a template ( Phase 1 ), the prefill the initial task based on word envrionment to obtain task-action data ( Phase 2 ). Finnally, it will filter the established task to evaluate the quality of task-action data. 1. Choose Template File Templates for your app must be defined and described in dataflow/templates/app . For instance, if you want to instantiate tasks for the Word application, place the relevant .docx files in dataflow /templates/word , along with a description.json file. The appropriate template will be selected based on how well its description matches the instruction. 2. Prefill the Task After selecting the template file, it will be opened, and a screenshot will be taken. If the template file is currently in use, errors may occur. The screenshot will be sent to the action prefill agent, which will return a modified task. 3. Filter Task The completed task will be evaluated by a filter agent, which will assess it and provide feedback. The more detailed code design documentation for instantiation can be found in instantiation . Execution The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process. In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step. The more detailed code design documentation for execution can be found in execution . Result The structure of the results of the task is as below: UFO/ \u251c\u2500\u2500 dataflow/ # Root folder for dataflow \u2502 \u2514\u2500\u2500 results/ # Directory for storing task processing results \u2502 \u251c\u2500\u2500 saved_document/ # Directory for final document results \u2502 \u251c\u2500\u2500 instantiation/ # Directory for instantiation results \u2502 \u2502 \u251c\u2500\u2500 instantiation_pass/ # Tasks successfully instantiated \u2502 \u2502 \u2514\u2500\u2500 instantiation_fail/ # Tasks that failed instantiation \u2502 \u251c\u2500\u2500 execution/ # Directory for execution results \u2502 \u2502 \u251c\u2500\u2500 execution_pass/ # Tasks successfully executed \u2502 \u2502 \u251c\u2500\u2500 execution_fail/ # Tasks that failed execution \u2502 \u2502 \u2514\u2500\u2500 execution_unsure/ # Tasks with uncertain execution results \u2502 \u251c\u2500\u2500 dataflow/ # Directory for dataflow results \u2502 \u2502 \u251c\u2500\u2500 execution_pass/ # Tasks successfully executed \u2502 \u2502 \u251c\u2500\u2500 execution_fail/ # Tasks that failed execution \u2502 \u2502 \u2514\u2500\u2500 execution_unsure/ # Tasks with uncertain execution results \u2502 \u2514\u2500\u2500 ... \u2514\u2500\u2500 ... General Description: This directory structure organizes the results of task processing into specific categories, including instantiation, execution, and dataflow outcomes. 2. Instantiation: The instantiation directory contains subfolders for tasks that were successfully instantiated ( instantiation_pass ) and those that failed during instantiation ( instantiation_fail ). 3. Execution: Results of task execution are stored under the execution directory, categorized into successful tasks ( execution_pass ), failed tasks ( execution_fail ), and tasks with uncertain outcomes ( execution_unsure ). 4. Dataflow Results: The dataflow directory similarly holds results of tasks based on execution success, failure, or uncertainty, providing a comprehensive view of the data processing pipeline. 5. Saved Documents: Instantiated results are separately stored in the saved_document directory for easy access and reference. Description This section illustrates the structure of the result of the task, organized in a hierarchical format to describe the various fields and their purposes. The result data include unique_id \uff0c app , original , execution_result , instantiation_result , time_cost . 1. Field Descriptions Hierarchy : The data is presented in a hierarchical manner to allow for a clearer understanding of field relationships. Type Description : The type of each field (e.g., string , array , object ) clearly specifies the format of the data. Field Purpose : Each field has a brief description outlining its function. 2. Execution Results and Errors execution_result : Contains the results of task execution, including subtask performance, completion status, and any encountered errors. instantiation_result : Describes the process of task instantiation, including template selection, prefilled tasks, and instantiation evaluation. error : If an error occurs during task execution, this field will contain the relevant error information. 3. Time Consumption time_cost : The time spent on each phase of the task, from template selection to task execution, is recorded to analyze task efficiency. Example Data { \"unique_id\": \"102\", \"app\": \"word\", \"original\": { \"original_task\": \"Find which Compatibility Mode you are in for Word\", \"original_steps\": [ \"1.Click the **File** tab.\", \"2.Click **Info**.\", \"3.Check the **Compatibility Mode** indicator at the bottom of the document preview pane.\" ] }, \"execution_result\": { \"result\": { \"reason\": \"The agent successfully identified the compatibility mode of the Word document.\", \"sub_scores\": { \"correct identification of compatibility mode\": \"yes\" }, \"complete\": \"yes\" }, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\102.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Identify the Compatibility Mode of the Word document.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Identify the Compatibility Mode\", \"Function\": \"summary\", \"Args\": { \"text\": \"The document is in '102 - Compatibility Mode'.\" }, \"Success\": true } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"Identifying the Compatibility Mode of a Word document is a task that can be executed locally within Word.\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.017, \"prefill\": 11.304, \"instantiation_evaluation\": 2.38, \"total\": 34.584, \"execute\": 0.946, \"execute_eval\": 10.381 } } Quick Start We prepare two cases to show the dataflow, which can be found in dataflow\\tasks\\prefill . So after installing required packages, you can type the following command in the command line: python -m dataflow -dataflow And you can see the hints showing in the terminal, which means the dataflow is working. Structure of related files After the two tasks are finished, the task and output files would appear as follows: UFO/ \u251c\u2500\u2500 dataflow/ \u2502 \u2514\u2500\u2500 results/ \u2502 \u251c\u2500\u2500 saved_document/ # Directory for saved documents \u2502 \u2502 \u251c\u2500\u2500 bulleted.docx # Result of the \"bulleted\" task \u2502 \u2502 \u2514\u2500\u2500 rotate.docx # Result of the \"rotate\" task \u2502 \u251c\u2500\u2500 dataflow/ # Dataflow results directory \u2502 \u2502 \u251c\u2500\u2500 execution_pass/ # Successfully executed tasks \u2502 \u2502 \u2502 \u251c\u2500\u2500 bulleted.json # Execution result for the \"bulleted\" task \u2502 \u2502 \u2502 \u251c\u2500\u2500 rotate.json # Execution result for the \"rotate\" task \u2502 \u2502 \u2502 \u2514\u2500\u2500 ... \u2514\u2500\u2500 ... Result files The result stucture of bulleted task is shown as below. This document provides a detailed breakdown of the task execution process for turning lines of text into a bulleted list in Word. It includes the original task description, execution results, and time analysis for each step. unique_id : The identifier for the task, in this case, \"5\" . app : The application being used, which is \"word\" . original : Contains the original task description and the steps. original_task : Describes the task in simple terms (turning text into a bulleted list). original_steps : Lists the steps required to perform the task. execution_result : Provides the result of executing the task. result : Describes the outcome of the execution, including a success message and sub-scores for each part of the task. The complete: \"yes\" means the evaluation agent think the execution process is successful! The sub_score is the evaluation of each subtask, corresponding to the instantiated_plan in the prefill . error : If any error occurred during execution, it would be reported here, but it's null in this case. instantiation_result : Details the instantiation of the task (setting up the task for execution). choose_template : Path to the template or document created during the task (in this case, the bulleted list document). prefill : Describes the instantiated_request and instantiated_plan and the steps involved, such as selecting text and clicking buttons, which is the result of prefill flow. The Success and MatchedControlText is added in the execution process. Success indicates whether the subtask was executed successfully. MatchedControlText refers to the control text that was matched during the execution process based on the plan. instantiation_evaluation : Provides feedback on the task's feasibility and the evaluation of the request, which is result of the filter flow. \"judge\": true : This indicates that the evaluation of the task was positive, meaning the task is considered valid or successfully judged. And the thought is the detailed reason. time_cost : The time spent on different parts of the task, including template selection, prefill, instantiation evaluation, and execution. Total time is also given. This structure follows your description and provides the necessary details in a consistent format. { \"unique_id\": \"5\", \"app\": \"word\", \"original\": { \"original_task\": \"Turning lines of text into a bulleted list in Word\", \"original_steps\": [ \"1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list\", \"2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style\" ] }, \"execution_result\": { \"result\": { \"reason\": \"The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.\", \"sub_scores\": { \"text selection\": \"yes\", \"bulleted list conversion\": \"yes\" }, \"complete\": \"yes\" }, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\bulleted.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Turn the line of text 'text to edit' into a bulleted list in Word.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Place the cursor at the beginning of the text 'text to edit'\", \"ControlLabel\": null, \"ControlText\": \"\", \"Function\": \"select_text\", \"Args\": { \"text\": \"text to edit\" }, \"Success\": true, \"MatchedControlText\": null }, { \"Step\": 2, \"Subtask\": \"Click the Bullets button in the Paragraph group on the Home tab\", \"ControlLabel\": \"61\", \"ControlText\": \"Bullets\", \"Function\": \"click_input\", \"Args\": { \"button\": \"left\", \"double\": false }, \"Success\": true, \"MatchedControlText\": \"Bullets\" } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.\", \"request_type\": \"None\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.012, \"prefill\": 15.649, \"instantiation_evaluation\": 2.469, \"execute\": 5.824, \"execute_eval\": 8.702, \"total\": 43.522 } } Log files The corresponding logs can be found in the directories logs/bulleted and logs/rotate , as shown below. Detailed logs for each workflow are recorded, capturing every step of the execution process. Reference AppEnum Bases: Enum Enum class for applications. Initialize the application enum. Parameters: id ( int ) \u2013 The ID of the application. description ( str ) \u2013 The description of the application. file_extension ( str ) \u2013 The file extension of the application. win_app ( str ) \u2013 The Windows application name. Source code in dataflow/data_flow_controller.py 47 48 49 50 51 52 53 54 55 56 57 58 59 60 def __init__ ( self , id : int , description : str , file_extension : str , win_app : str ): \"\"\" Initialize the application enum. :param id: The ID of the application. :param description: The description of the application. :param file_extension: The file extension of the application. :param win_app: The Windows application name. \"\"\" self . id = id self . description = description self . file_extension = file_extension self . win_app = win_app self . app_root_name = win_app . upper () + \".EXE\" TaskObject Initialize the task object. Parameters: task_file_path ( str ) \u2013 The path to the task file. task_type ( str ) \u2013 The task_type of the task object (dataflow, instantiation, or execution). Source code in dataflow/data_flow_controller.py 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 def __init__ ( self , task_file_path : str , task_type : str ) -> None : \"\"\" Initialize the task object. :param task_file_path: The path to the task file. :param task_type: The task_type of the task object (dataflow, instantiation, or execution). \"\"\" self . task_file_path = task_file_path self . task_file_base_name = os . path . basename ( task_file_path ) self . task_file_name = self . task_file_base_name . split ( \".\" )[ 0 ] task_json_file = load_json_file ( task_file_path ) self . app_object = self . _choose_app_from_json ( task_json_file [ \"app\" ]) # Initialize the task attributes based on the task_type self . _init_attr ( task_type , task_json_file ) DataFlowController Flow controller class to manage the instantiation and execution process. Initialize the flow controller. Parameters: task_path ( str ) \u2013 The path to the task file. task_type ( str ) \u2013 The task_type of the flow controller (instantiation, execution, or dataflow). Source code in dataflow/data_flow_controller.py 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 def __init__ ( self , task_path : str , task_type : str ) -> None : \"\"\" Initialize the flow controller. :param task_path: The path to the task file. :param task_type: The task_type of the flow controller (instantiation, execution, or dataflow). \"\"\" self . task_object = TaskObject ( task_path , task_type ) self . app_env = None self . app_name = self . task_object . app_object . description . lower () self . task_file_name = self . task_object . task_file_name self . schema = self . _load_schema ( task_type ) self . task_type = task_type self . task_info = self . init_task_info () self . result_hub = _configs [ \"RESULT_HUB\" ] . format ( task_type = task_type ) instantiated_plan : List [ Dict [ str , Any ]] property writable Get the instantiated plan from the task information. Returns: List [ Dict [ str , Any ]] \u2013 The instantiated plan. template_copied_path : str property Get the copied template path from the task information. Returns: str \u2013 The copied template path. execute_execution ( request , plan ) Execute the execution process. Parameters: request ( str ) \u2013 The task request to be executed. plan ( Dict [ str , any ] ) \u2013 The execution plan containing detailed steps. Source code in dataflow/data_flow_controller.py 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 def execute_execution ( self , request : str , plan : Dict [ str , any ]) -> None : \"\"\" Execute the execution process. :param request: The task request to be executed. :param plan: The execution plan containing detailed steps. \"\"\" print_with_color ( \"Executing the execution process...\" , \"blue\" ) execute_flow = None try : self . app_env . start ( self . template_copied_path ) # Initialize the execution context and flow context = Context () execute_flow = ExecuteFlow ( self . task_file_name , context , self . app_env ) # Execute the plan executed_plan , execute_result = execute_flow . execute ( request , plan ) # Update the instantiated plan self . instantiated_plan = executed_plan # Record execution results and time metrics self . task_info [ \"execution_result\" ][ \"result\" ] = execute_result self . task_info [ \"time_cost\" ][ \"execute\" ] = execute_flow . execution_time self . task_info [ \"time_cost\" ][ \"execute_eval\" ] = execute_flow . eval_time except Exception as e : # Handle and log any exceptions that occur during execution self . task_info [ \"execution_result\" ][ \"error\" ] = { \"type\" : str ( type ( e ) . __name__ ), \"message\" : str ( e ), \"traceback\" : traceback . format_exc (), } print_with_color ( f \"Error in Execution: { e } \" , \"red\" ) raise e finally : # Record the total time cost of the execution process if execute_flow and hasattr ( execute_flow , \"execution_time\" ): self . task_info [ \"time_cost\" ][ \"execute\" ] = execute_flow . execution_time else : self . task_info [ \"time_cost\" ][ \"execute\" ] = None if execute_flow and hasattr ( execute_flow , \"eval_time\" ): self . task_info [ \"time_cost\" ][ \"execute_eval\" ] = execute_flow . eval_time else : self . task_info [ \"time_cost\" ][ \"execute_eval\" ] = None execute_instantiation () Execute the instantiation process. Returns: Optional [ List [ Dict [ str , Any ]]] \u2013 The instantiation plan if successful. Source code in dataflow/data_flow_controller.py 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 def execute_instantiation ( self ) -> Optional [ List [ Dict [ str , Any ]]]: \"\"\" Execute the instantiation process. :return: The instantiation plan if successful. \"\"\" print_with_color ( f \"Instantiating task { self . task_object . task_file_name } ...\" , \"blue\" ) template_copied_path = self . instantiation_single_flow ( ChooseTemplateFlow , \"choose_template\" , init_params = [ self . task_object . app_object . file_extension ], execute_params = [] ) if template_copied_path : self . app_env . start ( template_copied_path ) prefill_result = self . instantiation_single_flow ( PrefillFlow , \"prefill\" , init_params = [ self . app_env ], execute_params = [ template_copied_path , self . task_object . task , self . task_object . refined_steps ] ) self . app_env . close () if prefill_result : self . instantiation_single_flow ( FilterFlow , \"instantiation_evaluation\" , init_params = [], execute_params = [ prefill_result [ \"instantiated_request\" ]] ) return prefill_result [ \"instantiated_plan\" ] init_task_info () Initialize the task information. Returns: Dict [ str , Any ] \u2013 The initialized task information. Source code in dataflow/data_flow_controller.py 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 def init_task_info ( self ) -> Dict [ str , Any ]: \"\"\" Initialize the task information. :return: The initialized task information. \"\"\" init_task_info = None if self . task_type == \"execution\" : # read from the instantiated task file init_task_info = load_json_file ( self . task_object . task_file_path ) else : init_task_info = { \"unique_id\" : self . task_object . unique_id , \"app\" : self . app_name , \"original\" : { \"original_task\" : self . task_object . task , \"original_steps\" : self . task_object . refined_steps , }, \"execution_result\" : { \"result\" : None , \"error\" : None }, \"instantiation_result\" : { \"choose_template\" : { \"result\" : None , \"error\" : None }, \"prefill\" : { \"result\" : None , \"error\" : None }, \"instantiation_evaluation\" : { \"result\" : None , \"error\" : None }, }, \"time_cost\" : {}, } return init_task_info instantiation_single_flow ( flow_class , flow_type , init_params = None , execute_params = None ) Execute a single flow process in the instantiation phase. Parameters: flow_class ( AppAgentProcessor ) \u2013 The flow class to instantiate. flow_type ( str ) \u2013 The type of the flow. init_params \u2013 The initialization parameters for the flow. execute_params \u2013 The execution parameters for the flow. Returns: Optional [ Dict [ str , Any ]] \u2013 The result of the flow process. Source code in dataflow/data_flow_controller.py 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 def instantiation_single_flow ( self , flow_class : AppAgentProcessor , flow_type : str , init_params = None , execute_params = None ) -> Optional [ Dict [ str , Any ]]: \"\"\" Execute a single flow process in the instantiation phase. :param flow_class: The flow class to instantiate. :param flow_type: The type of the flow. :param init_params: The initialization parameters for the flow. :param execute_params: The execution parameters for the flow. :return: The result of the flow process. \"\"\" flow_instance = None try : flow_instance = flow_class ( self . app_name , self . task_file_name , * init_params ) result = flow_instance . execute ( * execute_params ) self . task_info [ \"instantiation_result\" ][ flow_type ][ \"result\" ] = result return result except Exception as e : self . task_info [ \"instantiation_result\" ][ flow_type ][ \"error\" ] = { \"type\" : str ( e . __class__ ), \"error_message\" : str ( e ), \"traceback\" : traceback . format_exc (), } print_with_color ( f \"Error in { flow_type } : { e } { traceback . format_exc () } \" ) finally : if flow_instance and hasattr ( flow_instance , \"execution_time\" ): self . task_info [ \"time_cost\" ][ flow_type ] = flow_instance . execution_time else : self . task_info [ \"time_cost\" ][ flow_type ] = None run () Run the instantiation and execution process. Source code in dataflow/data_flow_controller.py 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 def run ( self ) -> None : \"\"\" Run the instantiation and execution process. \"\"\" start_time = time . time () try : self . app_env = WindowsAppEnv ( self . task_object . app_object ) if self . task_type == \"dataflow\" : plan = self . execute_instantiation () self . execute_execution ( self . task_object . task , plan ) elif self . task_type == \"instantiation\" : self . execute_instantiation () elif self . task_type == \"execution\" : plan = self . instantiated_plan self . execute_execution ( self . task_object . task , plan ) else : raise ValueError ( f \"Unsupported task_type: { self . task_type } \" ) except Exception as e : raise e finally : # Update or record the total time cost of the process total_time = round ( time . time () - start_time , 3 ) new_total_time = self . task_info . get ( \"time_cost\" , {}) . get ( \"total\" , 0 ) + total_time self . task_info [ \"time_cost\" ][ \"total\" ] = round ( new_total_time , 3 ) self . save_result () save_result () Validate and save the instantiated task result. Source code in dataflow/data_flow_controller.py 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 def save_result ( self ) -> None : \"\"\" Validate and save the instantiated task result. \"\"\" validation_error = None # Validate the result against the schema try : validate ( instance = self . task_info , schema = self . schema ) except ValidationError as e : # Record the validation error but allow the process to continue validation_error = str ( e . message ) print_with_color ( f \"Validation Error: { e . message } \" , \"yellow\" ) # Determine the target directory based on task_type and quality/completeness target_file = None if self . task_type == \"instantiation\" : # Determine the quality of the instantiation if not self . task_info [ \"instantiation_result\" ][ \"instantiation_evaluation\" ][ \"result\" ]: target_file = INSTANTIATION_RESULT_MAP [ False ] else : is_quality_good = self . task_info [ \"instantiation_result\" ][ \"instantiation_evaluation\" ][ \"result\" ][ \"judge\" ] target_file = INSTANTIATION_RESULT_MAP . get ( is_quality_good , INSTANTIATION_RESULT_MAP [ False ]) else : # Determine the completion status of the execution if not self . task_info [ \"execution_result\" ][ \"result\" ]: target_file = EXECUTION_RESULT_MAP [ \"no\" ] else : is_completed = self . task_info [ \"execution_result\" ][ \"result\" ][ \"complete\" ] target_file = EXECUTION_RESULT_MAP . get ( is_completed , EXECUTION_RESULT_MAP [ \"no\" ]) # Construct the full path to save the result new_task_path = os . path . join ( self . result_hub , target_file , self . task_object . task_file_base_name ) os . makedirs ( os . path . dirname ( new_task_path ), exist_ok = True ) save_json_file ( new_task_path , self . task_info ) print ( f \"Task saved to { new_task_path } \" ) # If validation failed, indicate that the saved result may need further inspection if validation_error : print ( \"The saved task result does not conform to the expected schema and may require review.\" ) Note Users should be careful to save the original files while using this project; otherwise, the files will be closed when the app is shut down. After starting the project, users should not close the app window while the program is taking screenshots.","title":"Overview"},{"location":"dataflow/overview/#dataflow","text":"Dataflow uses UFO to implement instantiation , execution , and dataflow for a given task, with options for batch processing and single processing. Instantiation : Instantiation refers to the process of setting up and preparing a task for execution. This step typically involves choosing template , prefill and filter . Execution : Execution is the actual process of running the task. This step involves carrying out the actions or operations specified by the Instantiation . And after execution, an evaluate agent will evaluate the quality of the whole execution process. Dataflow : Dataflow is the overarching process that combines instantiation and execution into a single pipeline. It provides an end-to-end solution for processing tasks, ensuring that all necessary steps (from initialization to execution) are seamlessly integrated. You can use instantiation and execution independently if you only need to perform one specific part of the process. When both steps are required for a task, the dataflow process streamlines them, allowing you to execute tasks from start to finish in a single pipeline. The overall processing of dataflow is as below. Given a task-plan data, the LLMwill instantiatie the task-action data, including choosing template, prefill, filter.","title":"Dataflow"},{"location":"dataflow/overview/#how-to-use","text":"","title":"How To Use"},{"location":"dataflow/overview/#1-install-packages","text":"You should install the necessary packages in the UFO root folder: pip install -r requirements.txt","title":"1. Install Packages"},{"location":"dataflow/overview/#2-configure-the-llms","text":"Before running dataflow, you need to provide your LLM configurations individually for PrefillAgent and FilterAgent . You can create your own config file dataflow/config/config.yaml , by copying the dataflow/config/config.yaml.template and editing config for PREFILL_AGENT and FILTER_AGENT as follows:","title":"2. Configure the LLMs"},{"location":"dataflow/overview/#openai","text":"VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"openai\" , # The API type, \"openai\" for the OpenAI API. API_BASE: \"https://api.openai.com/v1/chat/completions\", # The the OpenAI API endpoint. API_KEY: \"sk-\", # The OpenAI API key, begin with sk- API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The only OpenAI model","title":"OpenAI"},{"location":"dataflow/overview/#azure-openai-aoai","text":"VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"aoai\" , # The API type, \"aoai\" for the Azure OpenAI. API_BASE: \"YOUR_ENDPOINT\", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com API_KEY: \"YOUR_KEY\", # The aoai API key API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The only OpenAI model API_DEPLOYMENT_ID: \"YOUR_AOAI_DEPLOYMENT\", # The deployment id for the AOAI API You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False and proper API_MODEL (openai) and API_DEPLOYMENT_ID (aoai).","title":"Azure OpenAI (AOAI)"},{"location":"dataflow/overview/#non-visual-model-configuration","text":"You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file: VISUAL_MODE: False # To enable non-visual mode. Specify the appropriate API_MODEL (OpenAI) and API_DEPLOYMENT_ID (AOAI) for each agent. Ensure you configure these settings accurately to leverage non-visual models effectively.","title":"Non-Visual Model Configuration"},{"location":"dataflow/overview/#other-configurations","text":"config_dev.yaml specifies the paths of relevant files and contains default settings. The match strategy for the window match and control filter supports options: 'contains' , 'fuzzy' , and 'regex' , allowing flexible matching strategy for users. The MAX_STEPS is the max step for the execute_flow, which can be set by users. Note The specific implementation and invocation method of the matching strategy can refer to windows_app_env . Note BE CAREFUL! If you are using GitHub or other open-source tools, do not expose your config.yaml online, as it contains your private keys.","title":"Other Configurations"},{"location":"dataflow/overview/#3-prepare-files","text":"Certain files need to be prepared before running the task.","title":"3. Prepare Files"},{"location":"dataflow/overview/#31-tasks-as-json","text":"The tasks that need to be instantiated should be organized in a folder of JSON files, with the default folder path set to dataflow /tasks . This path can be changed in the dataflow/config/config.yaml file, or you can specify it in the terminal, as mentioned in 4. Start Running . For example, a task stored in dataflow/tasks/prefill/ may look like this: { // The app you want to use \"app\": \"word\", // A unique ID to distinguish different tasks \"unique_id\": \"1\", // The task and steps to be instantiated \"task\": \"Type 'hello' and set the font type to Arial\", \"refined_steps\": [ \"Type 'hello'\", \"Set the font to Arial\" ] }","title":"3.1. Tasks as JSON"},{"location":"dataflow/overview/#32-templates-and-descriptions","text":"You should place an app file as a reference for instantiation in a folder named after the app. For example, if you have template1.docx for Word, it should be located at dataflow/templates/word/template1.docx . Additionally, for each app folder, there should be a description.json file located at dataflow/templates/word/description.json , which describes each template file in detail. It may look like this: { \"template1.docx\": \"A document with a rectangle shape\", \"template2.docx\": \"A document with a line of text\" } If a description.json file is not present, one template file will be selected at random.","title":"3.2. Templates and Descriptions"},{"location":"dataflow/overview/#33-final-structure","text":"Ensure the following files are in place: JSON files to be instantiated Templates as references for instantiation Description file in JSON format The structure of the files can be: dataflow/ | \u251c\u2500\u2500 tasks \u2502 \u2514\u2500\u2500 prefill \u2502 \u251c\u2500\u2500 bulleted.json \u2502 \u251c\u2500\u2500 delete.json \u2502 \u251c\u2500\u2500 draw.json \u2502 \u251c\u2500\u2500 macro.json \u2502 \u2514\u2500\u2500 rotate.json \u251c\u2500\u2500 templates \u2502 \u2514\u2500\u2500 word \u2502 \u251c\u2500\u2500 description.json \u2502 \u251c\u2500\u2500 template1.docx \u2502 \u251c\u2500\u2500 template2.docx \u2502 \u251c\u2500\u2500 template3.docx \u2502 \u251c\u2500\u2500 template4.docx \u2502 \u251c\u2500\u2500 template5.docx \u2502 \u251c\u2500\u2500 template6.docx \u2502 \u2514\u2500\u2500 template7.docx \u2514\u2500\u2500 ...","title":"3.3. Final Structure"},{"location":"dataflow/overview/#4-start-running","text":"After finishing the previous steps, you can use the following commands in the command line. We provide single / batch process, for which you need to give the single file path / folder path. Determine the type of path provided by the user and automatically decide whether to process a single task or batch tasks. Also, you can choose to use instantiation / execution sections individually, or use them as a whole section, which is named as dataflow . The default task hub is set to be \"TASKS_HUB\" in dataflow/config_dev.yaml . Dataflow Task: python -m dataflow -dataflow --task_path path_to_task_file Instantiation Task: python -m dataflow -instantiation --task_path path_to_task_file Execution Task: python -m dataflow -execution --task_path path_to_task_file","title":"4. Start Running"},{"location":"dataflow/overview/#workflow","text":"","title":"Workflow"},{"location":"dataflow/overview/#instantiation","text":"There are three key steps in the instantiation process: Choose a template file according to the specified app and instruction. Prefill the task using the current screenshot. Filter the established task. Given the initial task, the dataflow first choose a template ( Phase 1 ), the prefill the initial task based on word envrionment to obtain task-action data ( Phase 2 ). Finnally, it will filter the established task to evaluate the quality of task-action data.","title":"Instantiation"},{"location":"dataflow/overview/#1-choose-template-file","text":"Templates for your app must be defined and described in dataflow/templates/app . For instance, if you want to instantiate tasks for the Word application, place the relevant .docx files in dataflow /templates/word , along with a description.json file. The appropriate template will be selected based on how well its description matches the instruction.","title":"1. Choose Template File"},{"location":"dataflow/overview/#2-prefill-the-task","text":"After selecting the template file, it will be opened, and a screenshot will be taken. If the template file is currently in use, errors may occur. The screenshot will be sent to the action prefill agent, which will return a modified task.","title":"2. Prefill the Task"},{"location":"dataflow/overview/#3-filter-task","text":"The completed task will be evaluated by a filter agent, which will assess it and provide feedback. The more detailed code design documentation for instantiation can be found in instantiation .","title":"3. Filter Task"},{"location":"dataflow/overview/#execution","text":"The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process. In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step.","title":"Execution"},{"location":"dataflow/overview/#result","text":"The structure of the results of the task is as below: UFO/ \u251c\u2500\u2500 dataflow/ # Root folder for dataflow \u2502 \u2514\u2500\u2500 results/ # Directory for storing task processing results \u2502 \u251c\u2500\u2500 saved_document/ # Directory for final document results \u2502 \u251c\u2500\u2500 instantiation/ # Directory for instantiation results \u2502 \u2502 \u251c\u2500\u2500 instantiation_pass/ # Tasks successfully instantiated \u2502 \u2502 \u2514\u2500\u2500 instantiation_fail/ # Tasks that failed instantiation \u2502 \u251c\u2500\u2500 execution/ # Directory for execution results \u2502 \u2502 \u251c\u2500\u2500 execution_pass/ # Tasks successfully executed \u2502 \u2502 \u251c\u2500\u2500 execution_fail/ # Tasks that failed execution \u2502 \u2502 \u2514\u2500\u2500 execution_unsure/ # Tasks with uncertain execution results \u2502 \u251c\u2500\u2500 dataflow/ # Directory for dataflow results \u2502 \u2502 \u251c\u2500\u2500 execution_pass/ # Tasks successfully executed \u2502 \u2502 \u251c\u2500\u2500 execution_fail/ # Tasks that failed execution \u2502 \u2502 \u2514\u2500\u2500 execution_unsure/ # Tasks with uncertain execution results \u2502 \u2514\u2500\u2500 ... \u2514\u2500\u2500 ... General Description: This directory structure organizes the results of task processing into specific categories, including instantiation, execution, and dataflow outcomes. 2. Instantiation: The instantiation directory contains subfolders for tasks that were successfully instantiated ( instantiation_pass ) and those that failed during instantiation ( instantiation_fail ). 3. Execution: Results of task execution are stored under the execution directory, categorized into successful tasks ( execution_pass ), failed tasks ( execution_fail ), and tasks with uncertain outcomes ( execution_unsure ). 4. Dataflow Results: The dataflow directory similarly holds results of tasks based on execution success, failure, or uncertainty, providing a comprehensive view of the data processing pipeline. 5. Saved Documents: Instantiated results are separately stored in the saved_document directory for easy access and reference.","title":"Result"},{"location":"dataflow/overview/#description","text":"This section illustrates the structure of the result of the task, organized in a hierarchical format to describe the various fields and their purposes. The result data include unique_id \uff0c app , original , execution_result , instantiation_result , time_cost .","title":"Description"},{"location":"dataflow/overview/#1-field-descriptions","text":"Hierarchy : The data is presented in a hierarchical manner to allow for a clearer understanding of field relationships. Type Description : The type of each field (e.g., string , array , object ) clearly specifies the format of the data. Field Purpose : Each field has a brief description outlining its function.","title":"1. Field Descriptions"},{"location":"dataflow/overview/#2-execution-results-and-errors","text":"execution_result : Contains the results of task execution, including subtask performance, completion status, and any encountered errors. instantiation_result : Describes the process of task instantiation, including template selection, prefilled tasks, and instantiation evaluation. error : If an error occurs during task execution, this field will contain the relevant error information.","title":"2. Execution Results and Errors"},{"location":"dataflow/overview/#3-time-consumption","text":"time_cost : The time spent on each phase of the task, from template selection to task execution, is recorded to analyze task efficiency.","title":"3. Time Consumption"},{"location":"dataflow/overview/#example-data","text":"{ \"unique_id\": \"102\", \"app\": \"word\", \"original\": { \"original_task\": \"Find which Compatibility Mode you are in for Word\", \"original_steps\": [ \"1.Click the **File** tab.\", \"2.Click **Info**.\", \"3.Check the **Compatibility Mode** indicator at the bottom of the document preview pane.\" ] }, \"execution_result\": { \"result\": { \"reason\": \"The agent successfully identified the compatibility mode of the Word document.\", \"sub_scores\": { \"correct identification of compatibility mode\": \"yes\" }, \"complete\": \"yes\" }, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\102.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Identify the Compatibility Mode of the Word document.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Identify the Compatibility Mode\", \"Function\": \"summary\", \"Args\": { \"text\": \"The document is in '102 - Compatibility Mode'.\" }, \"Success\": true } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"Identifying the Compatibility Mode of a Word document is a task that can be executed locally within Word.\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.017, \"prefill\": 11.304, \"instantiation_evaluation\": 2.38, \"total\": 34.584, \"execute\": 0.946, \"execute_eval\": 10.381 } }","title":"Example Data"},{"location":"dataflow/overview/#quick-start","text":"We prepare two cases to show the dataflow, which can be found in dataflow\\tasks\\prefill . So after installing required packages, you can type the following command in the command line: python -m dataflow -dataflow And you can see the hints showing in the terminal, which means the dataflow is working.","title":"Quick Start"},{"location":"dataflow/overview/#structure-of-related-files","text":"After the two tasks are finished, the task and output files would appear as follows: UFO/ \u251c\u2500\u2500 dataflow/ \u2502 \u2514\u2500\u2500 results/ \u2502 \u251c\u2500\u2500 saved_document/ # Directory for saved documents \u2502 \u2502 \u251c\u2500\u2500 bulleted.docx # Result of the \"bulleted\" task \u2502 \u2502 \u2514\u2500\u2500 rotate.docx # Result of the \"rotate\" task \u2502 \u251c\u2500\u2500 dataflow/ # Dataflow results directory \u2502 \u2502 \u251c\u2500\u2500 execution_pass/ # Successfully executed tasks \u2502 \u2502 \u2502 \u251c\u2500\u2500 bulleted.json # Execution result for the \"bulleted\" task \u2502 \u2502 \u2502 \u251c\u2500\u2500 rotate.json # Execution result for the \"rotate\" task \u2502 \u2502 \u2502 \u2514\u2500\u2500 ... \u2514\u2500\u2500 ...","title":"Structure of related files"},{"location":"dataflow/overview/#result-files","text":"The result stucture of bulleted task is shown as below. This document provides a detailed breakdown of the task execution process for turning lines of text into a bulleted list in Word. It includes the original task description, execution results, and time analysis for each step. unique_id : The identifier for the task, in this case, \"5\" . app : The application being used, which is \"word\" . original : Contains the original task description and the steps. original_task : Describes the task in simple terms (turning text into a bulleted list). original_steps : Lists the steps required to perform the task. execution_result : Provides the result of executing the task. result : Describes the outcome of the execution, including a success message and sub-scores for each part of the task. The complete: \"yes\" means the evaluation agent think the execution process is successful! The sub_score is the evaluation of each subtask, corresponding to the instantiated_plan in the prefill . error : If any error occurred during execution, it would be reported here, but it's null in this case. instantiation_result : Details the instantiation of the task (setting up the task for execution). choose_template : Path to the template or document created during the task (in this case, the bulleted list document). prefill : Describes the instantiated_request and instantiated_plan and the steps involved, such as selecting text and clicking buttons, which is the result of prefill flow. The Success and MatchedControlText is added in the execution process. Success indicates whether the subtask was executed successfully. MatchedControlText refers to the control text that was matched during the execution process based on the plan. instantiation_evaluation : Provides feedback on the task's feasibility and the evaluation of the request, which is result of the filter flow. \"judge\": true : This indicates that the evaluation of the task was positive, meaning the task is considered valid or successfully judged. And the thought is the detailed reason. time_cost : The time spent on different parts of the task, including template selection, prefill, instantiation evaluation, and execution. Total time is also given. This structure follows your description and provides the necessary details in a consistent format. { \"unique_id\": \"5\", \"app\": \"word\", \"original\": { \"original_task\": \"Turning lines of text into a bulleted list in Word\", \"original_steps\": [ \"1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list\", \"2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style\" ] }, \"execution_result\": { \"result\": { \"reason\": \"The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.\", \"sub_scores\": { \"text selection\": \"yes\", \"bulleted list conversion\": \"yes\" }, \"complete\": \"yes\" }, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\bulleted.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Turn the line of text 'text to edit' into a bulleted list in Word.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Place the cursor at the beginning of the text 'text to edit'\", \"ControlLabel\": null, \"ControlText\": \"\", \"Function\": \"select_text\", \"Args\": { \"text\": \"text to edit\" }, \"Success\": true, \"MatchedControlText\": null }, { \"Step\": 2, \"Subtask\": \"Click the Bullets button in the Paragraph group on the Home tab\", \"ControlLabel\": \"61\", \"ControlText\": \"Bullets\", \"Function\": \"click_input\", \"Args\": { \"button\": \"left\", \"double\": false }, \"Success\": true, \"MatchedControlText\": \"Bullets\" } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.\", \"request_type\": \"None\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.012, \"prefill\": 15.649, \"instantiation_evaluation\": 2.469, \"execute\": 5.824, \"execute_eval\": 8.702, \"total\": 43.522 } }","title":"Result files"},{"location":"dataflow/overview/#log-files","text":"The corresponding logs can be found in the directories logs/bulleted and logs/rotate , as shown below. Detailed logs for each workflow are recorded, capturing every step of the execution process.","title":"Log files"},{"location":"dataflow/overview/#reference","text":"","title":"Reference"},{"location":"dataflow/overview/#appenum","text":"Bases: Enum Enum class for applications. Initialize the application enum. Parameters: id ( int ) \u2013 The ID of the application. description ( str ) \u2013 The description of the application. file_extension ( str ) \u2013 The file extension of the application. win_app ( str ) \u2013 The Windows application name. Source code in dataflow/data_flow_controller.py 47 48 49 50 51 52 53 54 55 56 57 58 59 60 def __init__ ( self , id : int , description : str , file_extension : str , win_app : str ): \"\"\" Initialize the application enum. :param id: The ID of the application. :param description: The description of the application. :param file_extension: The file extension of the application. :param win_app: The Windows application name. \"\"\" self . id = id self . description = description self . file_extension = file_extension self . win_app = win_app self . app_root_name = win_app . upper () + \".EXE\"","title":"AppEnum"},{"location":"dataflow/overview/#taskobject","text":"Initialize the task object. Parameters: task_file_path ( str ) \u2013 The path to the task file. task_type ( str ) \u2013 The task_type of the task object (dataflow, instantiation, or execution). Source code in dataflow/data_flow_controller.py 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 def __init__ ( self , task_file_path : str , task_type : str ) -> None : \"\"\" Initialize the task object. :param task_file_path: The path to the task file. :param task_type: The task_type of the task object (dataflow, instantiation, or execution). \"\"\" self . task_file_path = task_file_path self . task_file_base_name = os . path . basename ( task_file_path ) self . task_file_name = self . task_file_base_name . split ( \".\" )[ 0 ] task_json_file = load_json_file ( task_file_path ) self . app_object = self . _choose_app_from_json ( task_json_file [ \"app\" ]) # Initialize the task attributes based on the task_type self . _init_attr ( task_type , task_json_file )","title":"TaskObject"},{"location":"dataflow/overview/#dataflowcontroller","text":"Flow controller class to manage the instantiation and execution process. Initialize the flow controller. Parameters: task_path ( str ) \u2013 The path to the task file. task_type ( str ) \u2013 The task_type of the flow controller (instantiation, execution, or dataflow). Source code in dataflow/data_flow_controller.py 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 def __init__ ( self , task_path : str , task_type : str ) -> None : \"\"\" Initialize the flow controller. :param task_path: The path to the task file. :param task_type: The task_type of the flow controller (instantiation, execution, or dataflow). \"\"\" self . task_object = TaskObject ( task_path , task_type ) self . app_env = None self . app_name = self . task_object . app_object . description . lower () self . task_file_name = self . task_object . task_file_name self . schema = self . _load_schema ( task_type ) self . task_type = task_type self . task_info = self . init_task_info () self . result_hub = _configs [ \"RESULT_HUB\" ] . format ( task_type = task_type )","title":"DataFlowController"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.instantiated_plan","text":"Get the instantiated plan from the task information. Returns: List [ Dict [ str , Any ]] \u2013 The instantiated plan.","title":"instantiated_plan"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.template_copied_path","text":"Get the copied template path from the task information. Returns: str \u2013 The copied template path.","title":"template_copied_path"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.execute_execution","text":"Execute the execution process. Parameters: request ( str ) \u2013 The task request to be executed. plan ( Dict [ str , any ] ) \u2013 The execution plan containing detailed steps. Source code in dataflow/data_flow_controller.py 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 def execute_execution ( self , request : str , plan : Dict [ str , any ]) -> None : \"\"\" Execute the execution process. :param request: The task request to be executed. :param plan: The execution plan containing detailed steps. \"\"\" print_with_color ( \"Executing the execution process...\" , \"blue\" ) execute_flow = None try : self . app_env . start ( self . template_copied_path ) # Initialize the execution context and flow context = Context () execute_flow = ExecuteFlow ( self . task_file_name , context , self . app_env ) # Execute the plan executed_plan , execute_result = execute_flow . execute ( request , plan ) # Update the instantiated plan self . instantiated_plan = executed_plan # Record execution results and time metrics self . task_info [ \"execution_result\" ][ \"result\" ] = execute_result self . task_info [ \"time_cost\" ][ \"execute\" ] = execute_flow . execution_time self . task_info [ \"time_cost\" ][ \"execute_eval\" ] = execute_flow . eval_time except Exception as e : # Handle and log any exceptions that occur during execution self . task_info [ \"execution_result\" ][ \"error\" ] = { \"type\" : str ( type ( e ) . __name__ ), \"message\" : str ( e ), \"traceback\" : traceback . format_exc (), } print_with_color ( f \"Error in Execution: { e } \" , \"red\" ) raise e finally : # Record the total time cost of the execution process if execute_flow and hasattr ( execute_flow , \"execution_time\" ): self . task_info [ \"time_cost\" ][ \"execute\" ] = execute_flow . execution_time else : self . task_info [ \"time_cost\" ][ \"execute\" ] = None if execute_flow and hasattr ( execute_flow , \"eval_time\" ): self . task_info [ \"time_cost\" ][ \"execute_eval\" ] = execute_flow . eval_time else : self . task_info [ \"time_cost\" ][ \"execute_eval\" ] = None","title":"execute_execution"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.execute_instantiation","text":"Execute the instantiation process. Returns: Optional [ List [ Dict [ str , Any ]]] \u2013 The instantiation plan if successful. Source code in dataflow/data_flow_controller.py 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 def execute_instantiation ( self ) -> Optional [ List [ Dict [ str , Any ]]]: \"\"\" Execute the instantiation process. :return: The instantiation plan if successful. \"\"\" print_with_color ( f \"Instantiating task { self . task_object . task_file_name } ...\" , \"blue\" ) template_copied_path = self . instantiation_single_flow ( ChooseTemplateFlow , \"choose_template\" , init_params = [ self . task_object . app_object . file_extension ], execute_params = [] ) if template_copied_path : self . app_env . start ( template_copied_path ) prefill_result = self . instantiation_single_flow ( PrefillFlow , \"prefill\" , init_params = [ self . app_env ], execute_params = [ template_copied_path , self . task_object . task , self . task_object . refined_steps ] ) self . app_env . close () if prefill_result : self . instantiation_single_flow ( FilterFlow , \"instantiation_evaluation\" , init_params = [], execute_params = [ prefill_result [ \"instantiated_request\" ]] ) return prefill_result [ \"instantiated_plan\" ]","title":"execute_instantiation"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.init_task_info","text":"Initialize the task information. Returns: Dict [ str , Any ] \u2013 The initialized task information. Source code in dataflow/data_flow_controller.py 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 def init_task_info ( self ) -> Dict [ str , Any ]: \"\"\" Initialize the task information. :return: The initialized task information. \"\"\" init_task_info = None if self . task_type == \"execution\" : # read from the instantiated task file init_task_info = load_json_file ( self . task_object . task_file_path ) else : init_task_info = { \"unique_id\" : self . task_object . unique_id , \"app\" : self . app_name , \"original\" : { \"original_task\" : self . task_object . task , \"original_steps\" : self . task_object . refined_steps , }, \"execution_result\" : { \"result\" : None , \"error\" : None }, \"instantiation_result\" : { \"choose_template\" : { \"result\" : None , \"error\" : None }, \"prefill\" : { \"result\" : None , \"error\" : None }, \"instantiation_evaluation\" : { \"result\" : None , \"error\" : None }, }, \"time_cost\" : {}, } return init_task_info","title":"init_task_info"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.instantiation_single_flow","text":"Execute a single flow process in the instantiation phase. Parameters: flow_class ( AppAgentProcessor ) \u2013 The flow class to instantiate. flow_type ( str ) \u2013 The type of the flow. init_params \u2013 The initialization parameters for the flow. execute_params \u2013 The execution parameters for the flow. Returns: Optional [ Dict [ str , Any ]] \u2013 The result of the flow process. Source code in dataflow/data_flow_controller.py 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 def instantiation_single_flow ( self , flow_class : AppAgentProcessor , flow_type : str , init_params = None , execute_params = None ) -> Optional [ Dict [ str , Any ]]: \"\"\" Execute a single flow process in the instantiation phase. :param flow_class: The flow class to instantiate. :param flow_type: The type of the flow. :param init_params: The initialization parameters for the flow. :param execute_params: The execution parameters for the flow. :return: The result of the flow process. \"\"\" flow_instance = None try : flow_instance = flow_class ( self . app_name , self . task_file_name , * init_params ) result = flow_instance . execute ( * execute_params ) self . task_info [ \"instantiation_result\" ][ flow_type ][ \"result\" ] = result return result except Exception as e : self . task_info [ \"instantiation_result\" ][ flow_type ][ \"error\" ] = { \"type\" : str ( e . __class__ ), \"error_message\" : str ( e ), \"traceback\" : traceback . format_exc (), } print_with_color ( f \"Error in { flow_type } : { e } { traceback . format_exc () } \" ) finally : if flow_instance and hasattr ( flow_instance , \"execution_time\" ): self . task_info [ \"time_cost\" ][ flow_type ] = flow_instance . execution_time else : self . task_info [ \"time_cost\" ][ flow_type ] = None","title":"instantiation_single_flow"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.run","text":"Run the instantiation and execution process. Source code in dataflow/data_flow_controller.py 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 def run ( self ) -> None : \"\"\" Run the instantiation and execution process. \"\"\" start_time = time . time () try : self . app_env = WindowsAppEnv ( self . task_object . app_object ) if self . task_type == \"dataflow\" : plan = self . execute_instantiation () self . execute_execution ( self . task_object . task , plan ) elif self . task_type == \"instantiation\" : self . execute_instantiation () elif self . task_type == \"execution\" : plan = self . instantiated_plan self . execute_execution ( self . task_object . task , plan ) else : raise ValueError ( f \"Unsupported task_type: { self . task_type } \" ) except Exception as e : raise e finally : # Update or record the total time cost of the process total_time = round ( time . time () - start_time , 3 ) new_total_time = self . task_info . get ( \"time_cost\" , {}) . get ( \"total\" , 0 ) + total_time self . task_info [ \"time_cost\" ][ \"total\" ] = round ( new_total_time , 3 ) self . save_result ()","title":"run"},{"location":"dataflow/overview/#data_flow_controller.DataFlowController.save_result","text":"Validate and save the instantiated task result. Source code in dataflow/data_flow_controller.py 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 def save_result ( self ) -> None : \"\"\" Validate and save the instantiated task result. \"\"\" validation_error = None # Validate the result against the schema try : validate ( instance = self . task_info , schema = self . schema ) except ValidationError as e : # Record the validation error but allow the process to continue validation_error = str ( e . message ) print_with_color ( f \"Validation Error: { e . message } \" , \"yellow\" ) # Determine the target directory based on task_type and quality/completeness target_file = None if self . task_type == \"instantiation\" : # Determine the quality of the instantiation if not self . task_info [ \"instantiation_result\" ][ \"instantiation_evaluation\" ][ \"result\" ]: target_file = INSTANTIATION_RESULT_MAP [ False ] else : is_quality_good = self . task_info [ \"instantiation_result\" ][ \"instantiation_evaluation\" ][ \"result\" ][ \"judge\" ] target_file = INSTANTIATION_RESULT_MAP . get ( is_quality_good , INSTANTIATION_RESULT_MAP [ False ]) else : # Determine the completion status of the execution if not self . task_info [ \"execution_result\" ][ \"result\" ]: target_file = EXECUTION_RESULT_MAP [ \"no\" ] else : is_completed = self . task_info [ \"execution_result\" ][ \"result\" ][ \"complete\" ] target_file = EXECUTION_RESULT_MAP . get ( is_completed , EXECUTION_RESULT_MAP [ \"no\" ]) # Construct the full path to save the result new_task_path = os . path . join ( self . result_hub , target_file , self . task_object . task_file_base_name ) os . makedirs ( os . path . dirname ( new_task_path ), exist_ok = True ) save_json_file ( new_task_path , self . task_info ) print ( f \"Task saved to { new_task_path } \" ) # If validation failed, indicate that the saved result may need further inspection if validation_error : print ( \"The saved task result does not conform to the expected schema and may require review.\" ) Note Users should be careful to save the original files while using this project; otherwise, the files will be closed when the app is shut down. After starting the project, users should not close the app window while the program is taking screenshots.","title":"save_result"},{"location":"dataflow/result_schema/","text":"Result schema Instantiation Result Schema This schema defines the structure of a JSON object that might be used to represent the results of task instantiation . Root Structure The schema is an object with the following key fields: unique_id : A string serving as the unique identifier for the task. app : A string representing the application where the task is being executed. original : An object containing details about the original task. Field Descriptions unique_id Type: string Purpose: Provides a globally unique identifier for the task. app Type: string Purpose: Specifies the application associated with the task execution. original Type: object Contains the following fields: original_task : Type: string Purpose: Describes the main task in textual form. original_steps : Type: array of string Purpose: Lists the sequential steps required for the task. Required fields: original_task , original_steps execution_result Type: object or null Contains fields describing the results of task execution: result : Always null , indicating no execution results are included. error : Always null , implying execution errors are not tracked in this schema. Purpose: Simplifies the structure by omitting detailed execution results. instantiation_result Type: object Contains fields detailing the results of task instantiation: choose_template : Type: object Fields: result : A string or null , representing the outcome of template selection. error : A string or null , detailing any errors during template selection. Required fields: result , error prefill : Type: object or null Contains results of pre-filling instantiation: result : Type: object or null Fields: instantiated_request : A string, representing the generated request. instantiated_plan : An array or null , listing instantiation steps: Step : An integer representing the sequence of the step. Subtask : A string describing the subtask. ControlLabel : A string or null , representing the control label. ControlText : A string, providing context for the step. Function : A string, specifying the function executed at this step. Args : An object, containing any arguments required by the function. Required fields: Step , Subtask , Function , Args Required fields: instantiated_request , instantiated_plan error : A string or null , describing errors encountered during prefill. Required fields: result , error instantiation_evaluation : Type: object Fields: result : Type: object or null Contains: judge : A boolean, indicating whether the instantiation is valid. thought : A string, providing reasoning or observations. request_type : A string, classifying the request type. Required fields: judge , thought , request_type error : A string or null , indicating errors during evaluation. Required fields: result , error time_cost Type: object Tracks time metrics for various stages of task instantiation: choose_template : A number or null , time spent selecting a template. prefill : A number or null , time used for pre-filling. instantiation_evaluation : A number or null , time spent on evaluation. total : A number or null , total time cost for all processes. Required fields: choose_template , prefill , instantiation_evaluation , total Example Data { \"unique_id\": \"5\", \"app\": \"word\", \"original\": { \"original_task\": \"Turning lines of text into a bulleted list in Word\", \"original_steps\": [ \"1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list\", \"2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style\" ] }, \"execution_result\": { \"result\": null, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\bulleted.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Turn the line of text 'text to edit' into a bulleted list in Word.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Place the cursor at the beginning of the text 'text to edit'\", \"ControlLabel\": null, \"ControlText\": \"\", \"Function\": \"select_text\", \"Args\": { \"text\": \"text to edit\" } }, { \"Step\": 2, \"Subtask\": \"Click the Bullets button in the Paragraph group on the Home tab\", \"ControlLabel\": null, \"ControlText\": \"Bullets\", \"Function\": \"click_input\", \"Args\": { \"button\": \"left\", \"double\": false } } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.\", \"request_type\": \"None\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.012, \"prefill\": 15.649, \"instantiation_evaluation\": 2.469, \"execute\": null, \"execute_eval\": null, \"total\": 18.130 } } Execution Result Schema This schema defines the structure of a JSON object that might be used to represent the results of task execution or dataflow . Below are the main fields and their detailed descriptions. Unlike the instantiation result, the execution result schema provides detailed feedback on execution, including success metrics ( reason , sub_scores ). Additionally, based on the original instantiated_plan, each step has been enhanced with the fields Success and MatchedControlText , which represent whether the step executed successfully (success is indicated by no errors) and the name of the last matched control, respectively. The ControlLabel will also be updated to reflect the final selected ControlLabel. Top-Level Fields unique_id Type : string Description : A unique identifier for the task or record. app Type : string Description : The name of the application associated with the task. original Type : object Description : Contains the original definition of the task. Properties : original_task : Type : string Description : The original description of the task. original_steps : Type : array Description : An array of strings representing the steps of the task. execution_result Type : object or null Description : Represents the results of the task execution. Properties : result : Type : object or null Description : Contains the details of the execution result. Sub-properties : reason : The reason for the execution result, type string . sub_scores : A set of sub-scores, represented as key-value pairs ( .* allows any key pattern). complete : Indicates the completion status, type string . error : Type : object or null Description : Represents any error information encountered during execution. Sub-properties : type : The type of error, type string . message : The error message, type string . traceback : The error traceback, type string . instantiation_result Type : object Description : Contains results related to task instantiation. Properties : choose_template : Type : object Description : Results of template selection. Sub-properties : result : The result of template selection, type string or null . error : Error information, type null or string . prefill : Type : object or null Description : Results of the prefill phase. Sub-properties : result : Type : object or null Description : Contains the instantiated request and plan. Sub-properties : instantiated_request : The instantiated task request, type string . instantiated_plan : The instantiated task plan, type array or null . Each item in the array is an object with: Step : Step number, type integer . Subtask : Description of the subtask, type string . ControlLabel : Control label, type string or null . ControlText : Control text, type string . Function : Function name, type string . Args : Arguments to the function, type object . Success : Whether the step succeeded, type boolean or null . MatchedControlText : Matched control text, type string or null . error : Prefill error information, type null or string . instantiation_evaluation : Type : object Description : Results of task instantiation evaluation. Sub-properties : result : Type : object or null Description : Contains evaluation information. Sub-properties : judge : Whether the evaluation succeeded, type boolean . thought : Evaluator's thoughts, type string . request_type : The type of request, type string . error : Evaluation error information, type null or string . time_cost Type : object Description : Represents the time costs for various phases. Properties : choose_template : Time spent selecting the template, type number or null . prefill : Time spent in the prefill phase, type number or null . instantiation_evaluation : Time spent in instantiation evaluation, type number or null . total : Total time cost, type number or null . execute : Time spent in execution, type number or null . execute_eval : Time spent in execution evaluation, type number or null . Required Fields The fields unique_id , app , original , execution_result , instantiation_result , and time_cost are required for the JSON object to be valid. Example Data { \"unique_id\": \"5\", \"app\": \"word\", \"original\": { \"original_task\": \"Turning lines of text into a bulleted list in Word\", \"original_steps\": [ \"1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list\", \"2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style\" ] }, \"execution_result\": { \"result\": { \"reason\": \"The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.\", \"sub_scores\": { \"text selection\": \"yes\", \"bulleted list conversion\": \"yes\" }, \"complete\": \"yes\" }, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\bulleted.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Turn the line of text 'text to edit' into a bulleted list in Word.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Place the cursor at the beginning of the text 'text to edit'\", \"ControlLabel\": null, \"ControlText\": \"\", \"Function\": \"select_text\", \"Args\": { \"text\": \"text to edit\" }, \"Success\": true, \"MatchedControlText\": null }, { \"Step\": 2, \"Subtask\": \"Click the Bullets button in the Paragraph group on the Home tab\", \"ControlLabel\": \"61\", \"ControlText\": \"Bullets\", \"Function\": \"click_input\", \"Args\": { \"button\": \"left\", \"double\": false }, \"Success\": true, \"MatchedControlText\": \"Bullets\" } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.\", \"request_type\": \"None\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.012, \"prefill\": 15.649, \"instantiation_evaluation\": 2.469, \"execute\": 5.824, \"execute_eval\": 8.702, \"total\": 43.522 } }","title":"Result Schema"},{"location":"dataflow/result_schema/#result-schema","text":"","title":"Result schema"},{"location":"dataflow/result_schema/#instantiation-result-schema","text":"This schema defines the structure of a JSON object that might be used to represent the results of task instantiation .","title":"Instantiation Result Schema"},{"location":"dataflow/result_schema/#root-structure","text":"The schema is an object with the following key fields: unique_id : A string serving as the unique identifier for the task. app : A string representing the application where the task is being executed. original : An object containing details about the original task.","title":"Root Structure"},{"location":"dataflow/result_schema/#field-descriptions","text":"unique_id Type: string Purpose: Provides a globally unique identifier for the task. app Type: string Purpose: Specifies the application associated with the task execution. original Type: object Contains the following fields: original_task : Type: string Purpose: Describes the main task in textual form. original_steps : Type: array of string Purpose: Lists the sequential steps required for the task. Required fields: original_task , original_steps execution_result Type: object or null Contains fields describing the results of task execution: result : Always null , indicating no execution results are included. error : Always null , implying execution errors are not tracked in this schema. Purpose: Simplifies the structure by omitting detailed execution results. instantiation_result Type: object Contains fields detailing the results of task instantiation: choose_template : Type: object Fields: result : A string or null , representing the outcome of template selection. error : A string or null , detailing any errors during template selection. Required fields: result , error prefill : Type: object or null Contains results of pre-filling instantiation: result : Type: object or null Fields: instantiated_request : A string, representing the generated request. instantiated_plan : An array or null , listing instantiation steps: Step : An integer representing the sequence of the step. Subtask : A string describing the subtask. ControlLabel : A string or null , representing the control label. ControlText : A string, providing context for the step. Function : A string, specifying the function executed at this step. Args : An object, containing any arguments required by the function. Required fields: Step , Subtask , Function , Args Required fields: instantiated_request , instantiated_plan error : A string or null , describing errors encountered during prefill. Required fields: result , error instantiation_evaluation : Type: object Fields: result : Type: object or null Contains: judge : A boolean, indicating whether the instantiation is valid. thought : A string, providing reasoning or observations. request_type : A string, classifying the request type. Required fields: judge , thought , request_type error : A string or null , indicating errors during evaluation. Required fields: result , error time_cost Type: object Tracks time metrics for various stages of task instantiation: choose_template : A number or null , time spent selecting a template. prefill : A number or null , time used for pre-filling. instantiation_evaluation : A number or null , time spent on evaluation. total : A number or null , total time cost for all processes. Required fields: choose_template , prefill , instantiation_evaluation , total","title":"Field Descriptions"},{"location":"dataflow/result_schema/#example-data","text":"{ \"unique_id\": \"5\", \"app\": \"word\", \"original\": { \"original_task\": \"Turning lines of text into a bulleted list in Word\", \"original_steps\": [ \"1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list\", \"2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style\" ] }, \"execution_result\": { \"result\": null, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\bulleted.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Turn the line of text 'text to edit' into a bulleted list in Word.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Place the cursor at the beginning of the text 'text to edit'\", \"ControlLabel\": null, \"ControlText\": \"\", \"Function\": \"select_text\", \"Args\": { \"text\": \"text to edit\" } }, { \"Step\": 2, \"Subtask\": \"Click the Bullets button in the Paragraph group on the Home tab\", \"ControlLabel\": null, \"ControlText\": \"Bullets\", \"Function\": \"click_input\", \"Args\": { \"button\": \"left\", \"double\": false } } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.\", \"request_type\": \"None\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.012, \"prefill\": 15.649, \"instantiation_evaluation\": 2.469, \"execute\": null, \"execute_eval\": null, \"total\": 18.130 } }","title":"Example Data"},{"location":"dataflow/result_schema/#execution-result-schema","text":"This schema defines the structure of a JSON object that might be used to represent the results of task execution or dataflow . Below are the main fields and their detailed descriptions. Unlike the instantiation result, the execution result schema provides detailed feedback on execution, including success metrics ( reason , sub_scores ). Additionally, based on the original instantiated_plan, each step has been enhanced with the fields Success and MatchedControlText , which represent whether the step executed successfully (success is indicated by no errors) and the name of the last matched control, respectively. The ControlLabel will also be updated to reflect the final selected ControlLabel.","title":"Execution Result Schema"},{"location":"dataflow/result_schema/#top-level-fields","text":"unique_id Type : string Description : A unique identifier for the task or record. app Type : string Description : The name of the application associated with the task. original Type : object Description : Contains the original definition of the task. Properties : original_task : Type : string Description : The original description of the task. original_steps : Type : array Description : An array of strings representing the steps of the task. execution_result Type : object or null Description : Represents the results of the task execution. Properties : result : Type : object or null Description : Contains the details of the execution result. Sub-properties : reason : The reason for the execution result, type string . sub_scores : A set of sub-scores, represented as key-value pairs ( .* allows any key pattern). complete : Indicates the completion status, type string . error : Type : object or null Description : Represents any error information encountered during execution. Sub-properties : type : The type of error, type string . message : The error message, type string . traceback : The error traceback, type string . instantiation_result Type : object Description : Contains results related to task instantiation. Properties : choose_template : Type : object Description : Results of template selection. Sub-properties : result : The result of template selection, type string or null . error : Error information, type null or string . prefill : Type : object or null Description : Results of the prefill phase. Sub-properties : result : Type : object or null Description : Contains the instantiated request and plan. Sub-properties : instantiated_request : The instantiated task request, type string . instantiated_plan : The instantiated task plan, type array or null . Each item in the array is an object with: Step : Step number, type integer . Subtask : Description of the subtask, type string . ControlLabel : Control label, type string or null . ControlText : Control text, type string . Function : Function name, type string . Args : Arguments to the function, type object . Success : Whether the step succeeded, type boolean or null . MatchedControlText : Matched control text, type string or null . error : Prefill error information, type null or string . instantiation_evaluation : Type : object Description : Results of task instantiation evaluation. Sub-properties : result : Type : object or null Description : Contains evaluation information. Sub-properties : judge : Whether the evaluation succeeded, type boolean . thought : Evaluator's thoughts, type string . request_type : The type of request, type string . error : Evaluation error information, type null or string . time_cost Type : object Description : Represents the time costs for various phases. Properties : choose_template : Time spent selecting the template, type number or null . prefill : Time spent in the prefill phase, type number or null . instantiation_evaluation : Time spent in instantiation evaluation, type number or null . total : Total time cost, type number or null . execute : Time spent in execution, type number or null . execute_eval : Time spent in execution evaluation, type number or null .","title":"Top-Level Fields"},{"location":"dataflow/result_schema/#required-fields","text":"The fields unique_id , app , original , execution_result , instantiation_result , and time_cost are required for the JSON object to be valid.","title":"Required Fields"},{"location":"dataflow/result_schema/#example-data_1","text":"{ \"unique_id\": \"5\", \"app\": \"word\", \"original\": { \"original_task\": \"Turning lines of text into a bulleted list in Word\", \"original_steps\": [ \"1. Place the cursor at the beginning of the line of text you want to turn into a bulleted list\", \"2. Click the Bullets button in the Paragraph group on the Home tab and choose a bullet style\" ] }, \"execution_result\": { \"result\": { \"reason\": \"The agent successfully selected the text 'text to edit' and then clicked on the 'Bullets' button in the Word application. The final screenshot shows that the text 'text to edit' has been converted into a bulleted list.\", \"sub_scores\": { \"text selection\": \"yes\", \"bulleted list conversion\": \"yes\" }, \"complete\": \"yes\" }, \"error\": null }, \"instantiation_result\": { \"choose_template\": { \"result\": \"dataflow\\\\results\\\\saved_document\\\\bulleted.docx\", \"error\": null }, \"prefill\": { \"result\": { \"instantiated_request\": \"Turn the line of text 'text to edit' into a bulleted list in Word.\", \"instantiated_plan\": [ { \"Step\": 1, \"Subtask\": \"Place the cursor at the beginning of the text 'text to edit'\", \"ControlLabel\": null, \"ControlText\": \"\", \"Function\": \"select_text\", \"Args\": { \"text\": \"text to edit\" }, \"Success\": true, \"MatchedControlText\": null }, { \"Step\": 2, \"Subtask\": \"Click the Bullets button in the Paragraph group on the Home tab\", \"ControlLabel\": \"61\", \"ControlText\": \"Bullets\", \"Function\": \"click_input\", \"Args\": { \"button\": \"left\", \"double\": false }, \"Success\": true, \"MatchedControlText\": \"Bullets\" } ] }, \"error\": null }, \"instantiation_evaluation\": { \"result\": { \"judge\": true, \"thought\": \"The task is specific and involves a basic function in Word that can be executed locally without any external dependencies.\", \"request_type\": \"None\" }, \"error\": null } }, \"time_cost\": { \"choose_template\": 0.012, \"prefill\": 15.649, \"instantiation_evaluation\": 2.469, \"execute\": 5.824, \"execute_eval\": 8.702, \"total\": 43.522 } }","title":"Example Data"},{"location":"dataflow/windows_app_env/","text":"WindowsAppEnv WindowsAppEnv class represents the environment for controlling a Windows application. It provides methods for starting, stopping, and interacting with Windows applications, including window matching based on configurable strategies. Matching Strategies In the WindowsAppEnv class, matching strategies are rules that determine how to match window or control names with a given document name or target text. Based on the configuration file, three different matching strategies can be selected: contains , fuzzy , and regex . Contains Matching is the simplest strategy, suitable when the window and document names match exactly. Fuzzy Matching is more flexible and can match even when there are spelling errors or partial matches between the window title and document name. s Matching offers the most flexibility, ideal for complex matching patterns in window titles. 1. Window Matching Example The method find_matching_window is responsible for matching windows based on the configured matching strategy. Here's how you can use it to find a window by providing a document name: Example: # Initialize your application object (assuming app_object is already defined) app_env = WindowsAppEnv(app_object) # Define the document name you're looking for doc_name = \"example_document_name\" # Call find_matching_window to find the window that matches the document name matching_window = app_env.find_matching_window(doc_name) if matching_window: print(f\"Found matching window: {matching_window.element_info.name}\") else: print(\"No matching window found.\") Explanation: app_env.find_matching_window(doc_name) will search through all open windows and match the window title using the strategy defined in the configuration (contains, fuzzy, or regex). If a match is found, the matching_window object will contain the matched window, and you can print the window's name. If no match is found, it will return None . 2. Control Matching Example To find a matching control within a window, you can use the find_matching_controller method. This method requires a dictionary of filtered controls and a control text to match against. Example: # Initialize your application object (assuming app_object is already defined) app_env = WindowsAppEnv(app_object) # Define a filtered annotation dictionary of controls (control_key, control_object) # Here, we assume you have a dictionary of UIAWrapper controls from a window. filtered_annotation_dict = { 1: some_control_1, # Example control objects 2: some_control_2, # Example control objects } # Define the control text you're searching for control_text = \"submit_button\" # Call find_matching_controller to find the best match controller_key, control_selected = app_env.find_matching_controller(filtered_annotation_dict, control_text) if control_selected: print(f\"Found matching control with key {controller_key}: {control_selected.window_text()}\") else: print(\"No matching control found.\") Explanation: filtered_annotation_dict is a dictionary where the key represents the control's ID and the value is the control object ( UIAWrapper ). control_text is the text you're searching for within those controls. app_env.find_matching_controller(filtered_annotation_dict, control_text) will calculate the matching score for each control based on the defined strategy and return the control with the highest match score. If a match is found, it will return the control object ( control_selected ) and its key ( controller_key ), which can be used for further interaction. Reference Represents the Windows Application Environment. Initializes the Windows Application Environment. Parameters: app_object ( object ) \u2013 The app object containing information about the application. Source code in env/env_manager.py 29 30 31 32 33 34 35 36 37 38 def __init__ ( self , app_object : object ) -> None : \"\"\" Initializes the Windows Application Environment. :param app_object: The app object containing information about the application. \"\"\" self . app_window = None self . app_root_name = app_object . app_root_name self . app_name = app_object . description . lower () self . win_app = app_object . win_app close () Tries to gracefully close the application; if it fails or is not closed, forcefully terminates the process. Source code in env/env_manager.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 def close ( self ) -> None : \"\"\" Tries to gracefully close the application; if it fails or is not closed, forcefully terminates the process. \"\"\" try : # Attempt to close gracefully if self . app_window : self . app_window . close () self . _check_and_kill_process () sleep ( 1 ) except Exception as e : logging . warning ( f \"Graceful close failed: { e } . Attempting to forcefully terminate the process.\" ) self . _check_and_kill_process () raise e find_matching_controller ( filtered_annotation_dict , control_text ) \" Select the best matched controller. Parameters: filtered_annotation_dict ( Dict [ int , UIAWrapper ] ) \u2013 The filtered annotation dictionary. control_text ( str ) \u2013 The text content of the control for additional context. Returns: Tuple [ str , UIAWrapper ] \u2013 Tuple containing the key of the selected controller and the control object.s Source code in env/env_manager.py 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 def find_matching_controller ( self , filtered_annotation_dict : Dict [ int , UIAWrapper ], control_text : str ) -> Tuple [ str , UIAWrapper ]: \"\"\"\" Select the best matched controller. :param filtered_annotation_dict: The filtered annotation dictionary. :param control_text: The text content of the control for additional context. :return: Tuple containing the key of the selected controller and the control object.s \"\"\" control_selected = None controller_key = None highest_score = 0 # Iterate through the filtered annotation dictionary to find the best match for key , control in filtered_annotation_dict . items (): # Calculate the matching score using the match function score = self . _calculate_match_score ( control , control_text ) # Update the selected control if the score is higher if score > highest_score : highest_score = score controller_key = key control_selected = control return controller_key , control_selected find_matching_window ( doc_name ) Finds a matching window based on the process name and the configured matching strategy. Parameters: doc_name ( str ) \u2013 The document name associated with the application. Returns: Optional [ UIAWrapper ] \u2013 The matched window or None if no match is found. Source code in env/env_manager.py 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 def find_matching_window ( self , doc_name : str ) -> Optional [ UIAWrapper ]: \"\"\" Finds a matching window based on the process name and the configured matching strategy. :param doc_name: The document name associated with the application. :return: The matched window or None if no match is found. \"\"\" desktop = Desktop ( backend = _BACKEND ) windows_list = desktop . windows () for window in windows_list : window_title = window . element_info . name . lower () if self . _match_window_name ( window_title , doc_name ): self . app_window = window return window return None start ( copied_template_path ) Starts the Windows environment. Parameters: copied_template_path ( str ) \u2013 The file path to the copied template to start the environment. Source code in env/env_manager.py 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 def start ( self , copied_template_path : str ) -> None : \"\"\" Starts the Windows environment. :param copied_template_path: The file path to the copied template to start the environment. \"\"\" from ufo.automator.ui_control import openfile file_controller = openfile . FileController ( _BACKEND ) try : file_controller . execute_code ( { \"APP\" : self . win_app , \"file_path\" : copied_template_path } ) except Exception as e : logging . exception ( f \"Failed to start the application: { e } \" ) raise","title":"Windows App Environment"},{"location":"dataflow/windows_app_env/#windowsappenv","text":"WindowsAppEnv class represents the environment for controlling a Windows application. It provides methods for starting, stopping, and interacting with Windows applications, including window matching based on configurable strategies.","title":"WindowsAppEnv"},{"location":"dataflow/windows_app_env/#matching-strategies","text":"In the WindowsAppEnv class, matching strategies are rules that determine how to match window or control names with a given document name or target text. Based on the configuration file, three different matching strategies can be selected: contains , fuzzy , and regex . Contains Matching is the simplest strategy, suitable when the window and document names match exactly. Fuzzy Matching is more flexible and can match even when there are spelling errors or partial matches between the window title and document name. s Matching offers the most flexibility, ideal for complex matching patterns in window titles.","title":"Matching Strategies"},{"location":"dataflow/windows_app_env/#1-window-matching-example","text":"The method find_matching_window is responsible for matching windows based on the configured matching strategy. Here's how you can use it to find a window by providing a document name:","title":"1. Window Matching Example"},{"location":"dataflow/windows_app_env/#example","text":"# Initialize your application object (assuming app_object is already defined) app_env = WindowsAppEnv(app_object) # Define the document name you're looking for doc_name = \"example_document_name\" # Call find_matching_window to find the window that matches the document name matching_window = app_env.find_matching_window(doc_name) if matching_window: print(f\"Found matching window: {matching_window.element_info.name}\") else: print(\"No matching window found.\")","title":"Example:"},{"location":"dataflow/windows_app_env/#explanation","text":"app_env.find_matching_window(doc_name) will search through all open windows and match the window title using the strategy defined in the configuration (contains, fuzzy, or regex). If a match is found, the matching_window object will contain the matched window, and you can print the window's name. If no match is found, it will return None .","title":"Explanation:"},{"location":"dataflow/windows_app_env/#2-control-matching-example","text":"To find a matching control within a window, you can use the find_matching_controller method. This method requires a dictionary of filtered controls and a control text to match against.","title":"2. Control Matching Example"},{"location":"dataflow/windows_app_env/#example_1","text":"# Initialize your application object (assuming app_object is already defined) app_env = WindowsAppEnv(app_object) # Define a filtered annotation dictionary of controls (control_key, control_object) # Here, we assume you have a dictionary of UIAWrapper controls from a window. filtered_annotation_dict = { 1: some_control_1, # Example control objects 2: some_control_2, # Example control objects } # Define the control text you're searching for control_text = \"submit_button\" # Call find_matching_controller to find the best match controller_key, control_selected = app_env.find_matching_controller(filtered_annotation_dict, control_text) if control_selected: print(f\"Found matching control with key {controller_key}: {control_selected.window_text()}\") else: print(\"No matching control found.\")","title":"Example:"},{"location":"dataflow/windows_app_env/#explanation_1","text":"filtered_annotation_dict is a dictionary where the key represents the control's ID and the value is the control object ( UIAWrapper ). control_text is the text you're searching for within those controls. app_env.find_matching_controller(filtered_annotation_dict, control_text) will calculate the matching score for each control based on the defined strategy and return the control with the highest match score. If a match is found, it will return the control object ( control_selected ) and its key ( controller_key ), which can be used for further interaction.","title":"Explanation:"},{"location":"dataflow/windows_app_env/#reference","text":"Represents the Windows Application Environment. Initializes the Windows Application Environment. Parameters: app_object ( object ) \u2013 The app object containing information about the application. Source code in env/env_manager.py 29 30 31 32 33 34 35 36 37 38 def __init__ ( self , app_object : object ) -> None : \"\"\" Initializes the Windows Application Environment. :param app_object: The app object containing information about the application. \"\"\" self . app_window = None self . app_root_name = app_object . app_root_name self . app_name = app_object . description . lower () self . win_app = app_object . win_app","title":"Reference"},{"location":"dataflow/windows_app_env/#env.env_manager.WindowsAppEnv.close","text":"Tries to gracefully close the application; if it fails or is not closed, forcefully terminates the process. Source code in env/env_manager.py 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 def close ( self ) -> None : \"\"\" Tries to gracefully close the application; if it fails or is not closed, forcefully terminates the process. \"\"\" try : # Attempt to close gracefully if self . app_window : self . app_window . close () self . _check_and_kill_process () sleep ( 1 ) except Exception as e : logging . warning ( f \"Graceful close failed: { e } . Attempting to forcefully terminate the process.\" ) self . _check_and_kill_process () raise e","title":"close"},{"location":"dataflow/windows_app_env/#env.env_manager.WindowsAppEnv.find_matching_controller","text":"\" Select the best matched controller. Parameters: filtered_annotation_dict ( Dict [ int , UIAWrapper ] ) \u2013 The filtered annotation dictionary. control_text ( str ) \u2013 The text content of the control for additional context. Returns: Tuple [ str , UIAWrapper ] \u2013 Tuple containing the key of the selected controller and the control object.s Source code in env/env_manager.py 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 def find_matching_controller ( self , filtered_annotation_dict : Dict [ int , UIAWrapper ], control_text : str ) -> Tuple [ str , UIAWrapper ]: \"\"\"\" Select the best matched controller. :param filtered_annotation_dict: The filtered annotation dictionary. :param control_text: The text content of the control for additional context. :return: Tuple containing the key of the selected controller and the control object.s \"\"\" control_selected = None controller_key = None highest_score = 0 # Iterate through the filtered annotation dictionary to find the best match for key , control in filtered_annotation_dict . items (): # Calculate the matching score using the match function score = self . _calculate_match_score ( control , control_text ) # Update the selected control if the score is higher if score > highest_score : highest_score = score controller_key = key control_selected = control return controller_key , control_selected","title":"find_matching_controller"},{"location":"dataflow/windows_app_env/#env.env_manager.WindowsAppEnv.find_matching_window","text":"Finds a matching window based on the process name and the configured matching strategy. Parameters: doc_name ( str ) \u2013 The document name associated with the application. Returns: Optional [ UIAWrapper ] \u2013 The matched window or None if no match is found. Source code in env/env_manager.py 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 def find_matching_window ( self , doc_name : str ) -> Optional [ UIAWrapper ]: \"\"\" Finds a matching window based on the process name and the configured matching strategy. :param doc_name: The document name associated with the application. :return: The matched window or None if no match is found. \"\"\" desktop = Desktop ( backend = _BACKEND ) windows_list = desktop . windows () for window in windows_list : window_title = window . element_info . name . lower () if self . _match_window_name ( window_title , doc_name ): self . app_window = window return window return None","title":"find_matching_window"},{"location":"dataflow/windows_app_env/#env.env_manager.WindowsAppEnv.start","text":"Starts the Windows environment. Parameters: copied_template_path ( str ) \u2013 The file path to the copied template to start the environment. Source code in env/env_manager.py 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 def start ( self , copied_template_path : str ) -> None : \"\"\" Starts the Windows environment. :param copied_template_path: The file path to the copied template to start the environment. \"\"\" from ufo.automator.ui_control import openfile file_controller = openfile . FileController ( _BACKEND ) try : file_controller . execute_code ( { \"APP\" : self . win_app , \"file_path\" : copied_template_path } ) except Exception as e : logging . exception ( f \"Failed to start the application: { e } \" ) raise","title":"start"},{"location":"getting_started/more_guidance/","text":"More Guidance For Users If you are a user of UFO, and want to use it to automate your tasks on Windows, you can refer to User Configuration to set up your environment and start using UFO. For instance, except for configuring the HOST_AGENT and APP_AGENT , you can also configure the LLM parameters and RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources. For Developers If you are a developer who wants to contribute to UFO, you can take a look at the Developer Configuration to explore the development environment setup and the development workflow. You can also refer to the Project Structure to understand the project structure and the role of each component in UFO, and use the rest of the documentation to understand the architecture and design of UFO. Taking a look at the Session and Round can help you understand the core logic of UFO. For debugging and testing, it is recommended to check the log files in the ufo/logs directory to track the execution of UFO and identify any issues that may arise.","title":"More Guidance"},{"location":"getting_started/more_guidance/#more-guidance","text":"","title":"More Guidance"},{"location":"getting_started/more_guidance/#for-users","text":"If you are a user of UFO, and want to use it to automate your tasks on Windows, you can refer to User Configuration to set up your environment and start using UFO. For instance, except for configuring the HOST_AGENT and APP_AGENT , you can also configure the LLM parameters and RAG parameters in the config.yaml file to enhance the UFO agent with additional knowledge sources.","title":"For Users"},{"location":"getting_started/more_guidance/#for-developers","text":"If you are a developer who wants to contribute to UFO, you can take a look at the Developer Configuration to explore the development environment setup and the development workflow. You can also refer to the Project Structure to understand the project structure and the role of each component in UFO, and use the rest of the documentation to understand the architecture and design of UFO. Taking a look at the Session and Round can help you understand the core logic of UFO. For debugging and testing, it is recommended to check the log files in the ufo/logs directory to track the execution of UFO and identify any issues that may arise.","title":"For Developers"},{"location":"getting_started/quick_start/","text":"Quick Start \ud83d\udee0\ufe0f Step 1: Installation UFO requires Python >= 3.10 running on Windows OS >= 10 . It can be installed by running the following command: # [optional to create conda environment] # conda create -n ufo python=3.10 # conda activate ufo # clone the repository git clone https://github.com/microsoft/UFO.git cd UFO # install the requirements pip install -r requirements.txt # If you want to use the Qwen as your LLMs, uncomment the related libs. \u2699\ufe0f Step 2: Configure the LLMs Before running UFO, you need to provide your LLM configurations individually for HostAgent and AppAgent . You can create your own config file ufo/config/config.yaml , by copying the ufo/config/config.yaml.template and editing config for APP_AGENT and ACTION_AGENT as follows: OpenAI VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"openai\" , # The API type, \"openai\" for the OpenAI API. API_BASE: \"https://api.openai.com/v1/chat/completions\", # The the OpenAI API endpoint. API_KEY: \"sk-\", # The OpenAI API key, begin with sk- API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model Azure OpenAI (AOAI) VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"aoai\" , # The API type, \"aoai\" for the Azure OpenAI. API_BASE: \"YOUR_ENDPOINT\", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com API_KEY: \"YOUR_KEY\", # The aoai API key API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model API_DEPLOYMENT_ID: \"YOUR_AOAI_DEPLOYMENT\", # The deployment id for the AOAI API You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False and proper API_MODEL (openai) and API_DEPLOYMENT_ID (aoai). You can also optionally set an backup LLM engine in the field of BACKUP_AGENT if the above engines failed during the inference. The API_MODEL can be any GPT models that can accept images as input. Non-Visual Model Configuration You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file: Info VISUAL_MODE: False Specify the appropriate API_MODEL (OpenAI) and API_DEPLOYMENT_ID (AOAI) for each agent. Optionally, you can set a backup language model (LLM) engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively. Note UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the documents for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in config_dev.yaml . \ud83d\udcd4 Step 3: Additional Setting for RAG (optional). If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the ufo/config/config.yaml file. We provide the following options for RAG to enhance UFO's capabilities: Offline Help Document : Enable UFO to retrieve information from offline help documents. Online Bing Search Engine : Enhance UFO's capabilities by utilizing the most up-to-date online search results. Self-Experience : Save task completion trajectories into UFO's memory for future reference. User-Demonstration : Boost UFO's capabilities through user demonstration. Tip Consult their respective documentation for more information on how to configure these settings. \ud83c\udf89 Step 4: Start UFO \u2328\ufe0f You can execute the following on your Windows command Line (CLI): # assume you are in the cloned UFO folder python -m ufo --task This will start the UFO process and you can interact with it through the command line interface. If everything goes well, you will see the following message: Welcome to use UFO\ud83d\udef8, A UI-focused Agent for Windows OS Interaction. _ _ _____ ___ | | | || ___| / _ \\ | | | || |_ | | | | | |_| || _| | |_| | \\___/ |_| \\___/ Please enter your request to be completed\ud83d\udef8: Step 5 \ud83c\udfa5: Execution Logs You can find the screenshots taken and request & response logs in the following folder: ./ufo/logs// You may use them to debug, replay, or analyze the agent output. Note Before UFO executing your request, please make sure the targeted applications are active on the system. Note The GPT-V accepts screenshots of your desktop and application GUI as input. Please ensure that no sensitive or confidential information is visible or captured during the execution process. For further information, refer to DISCLAIMER.md .","title":"Quick Start"},{"location":"getting_started/quick_start/#quick-start","text":"","title":"Quick Start"},{"location":"getting_started/quick_start/#step-1-installation","text":"UFO requires Python >= 3.10 running on Windows OS >= 10 . It can be installed by running the following command: # [optional to create conda environment] # conda create -n ufo python=3.10 # conda activate ufo # clone the repository git clone https://github.com/microsoft/UFO.git cd UFO # install the requirements pip install -r requirements.txt # If you want to use the Qwen as your LLMs, uncomment the related libs.","title":"\ud83d\udee0\ufe0f Step 1: Installation"},{"location":"getting_started/quick_start/#step-2-configure-the-llms","text":"Before running UFO, you need to provide your LLM configurations individually for HostAgent and AppAgent . You can create your own config file ufo/config/config.yaml , by copying the ufo/config/config.yaml.template and editing config for APP_AGENT and ACTION_AGENT as follows:","title":"\u2699\ufe0f Step 2: Configure the LLMs"},{"location":"getting_started/quick_start/#openai","text":"VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"openai\" , # The API type, \"openai\" for the OpenAI API. API_BASE: \"https://api.openai.com/v1/chat/completions\", # The the OpenAI API endpoint. API_KEY: \"sk-\", # The OpenAI API key, begin with sk- API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model","title":"OpenAI"},{"location":"getting_started/quick_start/#azure-openai-aoai","text":"VISUAL_MODE: True, # Whether to use the visual mode API_TYPE: \"aoai\" , # The API type, \"aoai\" for the Azure OpenAI. API_BASE: \"YOUR_ENDPOINT\", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com API_KEY: \"YOUR_KEY\", # The aoai API key API_VERSION: \"2024-02-15-preview\", # \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model API_DEPLOYMENT_ID: \"YOUR_AOAI_DEPLOYMENT\", # The deployment id for the AOAI API You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False and proper API_MODEL (openai) and API_DEPLOYMENT_ID (aoai). You can also optionally set an backup LLM engine in the field of BACKUP_AGENT if the above engines failed during the inference. The API_MODEL can be any GPT models that can accept images as input.","title":"Azure OpenAI (AOAI)"},{"location":"getting_started/quick_start/#non-visual-model-configuration","text":"You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file: Info VISUAL_MODE: False Specify the appropriate API_MODEL (OpenAI) and API_DEPLOYMENT_ID (AOAI) for each agent. Optionally, you can set a backup language model (LLM) engine in the BACKUP_AGENT field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively. Note UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the documents for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in config_dev.yaml .","title":"Non-Visual Model Configuration"},{"location":"getting_started/quick_start/#step-3-additional-setting-for-rag-optional","text":"If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the ufo/config/config.yaml file. We provide the following options for RAG to enhance UFO's capabilities: Offline Help Document : Enable UFO to retrieve information from offline help documents. Online Bing Search Engine : Enhance UFO's capabilities by utilizing the most up-to-date online search results. Self-Experience : Save task completion trajectories into UFO's memory for future reference. User-Demonstration : Boost UFO's capabilities through user demonstration. Tip Consult their respective documentation for more information on how to configure these settings.","title":"\ud83d\udcd4 Step 3: Additional Setting for RAG (optional)."},{"location":"getting_started/quick_start/#step-4-start-ufo","text":"","title":"\ud83c\udf89 Step 4: Start UFO"},{"location":"getting_started/quick_start/#you-can-execute-the-following-on-your-windows-command-line-cli","text":"# assume you are in the cloned UFO folder python -m ufo --task This will start the UFO process and you can interact with it through the command line interface. If everything goes well, you will see the following message: Welcome to use UFO\ud83d\udef8, A UI-focused Agent for Windows OS Interaction. _ _ _____ ___ | | | || ___| / _ \\ | | | || |_ | | | | | |_| || _| | |_| | \\___/ |_| \\___/ Please enter your request to be completed\ud83d\udef8:","title":"\u2328\ufe0f You can execute the following on your Windows command Line (CLI):"},{"location":"getting_started/quick_start/#step-5-execution-logs","text":"You can find the screenshots taken and request & response logs in the following folder: ./ufo/logs// You may use them to debug, replay, or analyze the agent output. Note Before UFO executing your request, please make sure the targeted applications are active on the system. Note The GPT-V accepts screenshots of your desktop and application GUI as input. Please ensure that no sensitive or confidential information is visible or captured during the execution process. For further information, refer to DISCLAIMER.md .","title":"Step 5 \ud83c\udfa5: Execution Logs"},{"location":"logs/evaluation_logs/","text":"Evaluation Logs The evaluation logs store the evaluation results from the EvaluationAgent . The evaluation log contains the following information: Field Description Type Reason The detailed reason for your judgment, by observing the screenshot differences and the . String Sub-score The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. List of Dictionaries Complete The completion status of the evaluation, can be yes , no , or unsure . String level The level of the evaluation. String request The request sent to the EvaluationAgent . Dictionary id The ID of the evaluation. Integer","title":"Evaluation Logs"},{"location":"logs/evaluation_logs/#evaluation-logs","text":"The evaluation logs store the evaluation results from the EvaluationAgent . The evaluation log contains the following information: Field Description Type Reason The detailed reason for your judgment, by observing the screenshot differences and the . String Sub-score The sub-score of the evaluation in decomposing the evaluation into multiple sub-goals. List of Dictionaries Complete The completion status of the evaluation, can be yes , no , or unsure . String level The level of the evaluation. String request The request sent to the EvaluationAgent . Dictionary id The ID of the evaluation. Integer","title":"Evaluation Logs"},{"location":"logs/overview/","text":"UFO Logs Logs are essential for debugging and understanding the behavior of the UFO framework. There are three types of logs generated by UFO: Log Type Description Location Level Request Log Contains the prompt requests to LLMs. logs/{task_name}/request.log Info Step Log Contains the agent's response to the user's request and additional information at every step. logs/{task_name}/response.log Info Evaluation Log Contains the evaluation results from the EvaluationAgent . logs/{task_name}/evaluation.log Info Screenshots Contains the screenshots of the application UI. logs/{task_name}/ - All logs are stored in the logs/{task_name} directory.","title":"Overview"},{"location":"logs/overview/#ufo-logs","text":"Logs are essential for debugging and understanding the behavior of the UFO framework. There are three types of logs generated by UFO: Log Type Description Location Level Request Log Contains the prompt requests to LLMs. logs/{task_name}/request.log Info Step Log Contains the agent's response to the user's request and additional information at every step. logs/{task_name}/response.log Info Evaluation Log Contains the evaluation results from the EvaluationAgent . logs/{task_name}/evaluation.log Info Screenshots Contains the screenshots of the application UI. logs/{task_name}/ - All logs are stored in the logs/{task_name} directory.","title":"UFO Logs"},{"location":"logs/request_logs/","text":"Request Logs The request is the prompt requests to the LLMs. The request log is stored in the request.log file. The request log contains the following information for each step: Field Description step The step number of the session. prompt The prompt message sent to the LLMs. The request log is stored at the debug level. You can configure the logging level in the LOG_LEVEL field in the config_dev.yaml file. Tip You can use the following python code to read the request log: import json with open('logs/{task_name}/request.log', 'r') as f: for line in f: log = json.loads(line)","title":"Request Logs"},{"location":"logs/request_logs/#request-logs","text":"The request is the prompt requests to the LLMs. The request log is stored in the request.log file. The request log contains the following information for each step: Field Description step The step number of the session. prompt The prompt message sent to the LLMs. The request log is stored at the debug level. You can configure the logging level in the LOG_LEVEL field in the config_dev.yaml file. Tip You can use the following python code to read the request log: import json with open('logs/{task_name}/request.log', 'r') as f: for line in f: log = json.loads(line)","title":"Request Logs"},{"location":"logs/screenshots_logs/","text":"Screenshot Logs UFO also save desktop or application screenshots for debugging and evaluation purposes. The screenshot logs are stored in the logs/{task_name}/ . There are 4 types of screenshot logs generated by UFO, as detailed below. Clean Screenshots At each step, UFO saves a clean screenshot of the desktop or application. The clean screenshot is saved in the action_step{step_number}.png file. In addition, the clean screenshots are also saved when a sub-task, round or session is completed. The clean screenshots are saved in the action_round_{round_id}_sub_round_{sub_task_id}_final.png , action_round_{round_id}_final.png and action_step_final.png files, respectively. Below is an example of a clean screenshot. Annotation Screenshots UFO also saves annotated screenshots of the application, with each control item is annotated with a number, following the Set-of-Mark paradigm. The annotated screenshots are saved in the action_step{step_number}_annotated.png file. Below is an example of an annotated screenshot. Info Only selected types of controls are annotated in the screenshots. They are configured in the config_dev.yaml file under the CONTROL_LIST field. Tip Different types of controls are annotated with different colors. You can configure the colors in the config_dev.yaml file under the ANNOTATION_COLORS field. Concatenated Screenshots UFO also saves concatenated screenshots of the application, with clean and annotated screenshots concatenated side by side. The concatenated screenshots are saved in the action_step{step_number}_concat.png file. Below is an example of a concatenated screenshot. Info You can configure whether to feed the concatenated screenshots to the LLMs, or separate clean and annotated screenshots, in the config_dev.yaml file under the CONCAT_SCREENSHOT field. Selected Control Screenshots UFO saves screenshots of the selected control item for operation. The selected control screenshots are saved in the action_step{step_number}_selected_controls.png file. Below is an example of a selected control screenshot. Info You can configure whether to feed LLM with the selected control screenshots at the previous step to enhance the context, in the config_dev.yaml file under the INCLUDE_LAST_SCREENSHOT field.","title":"Screenshots"},{"location":"logs/screenshots_logs/#screenshot-logs","text":"UFO also save desktop or application screenshots for debugging and evaluation purposes. The screenshot logs are stored in the logs/{task_name}/ . There are 4 types of screenshot logs generated by UFO, as detailed below.","title":"Screenshot Logs"},{"location":"logs/screenshots_logs/#clean-screenshots","text":"At each step, UFO saves a clean screenshot of the desktop or application. The clean screenshot is saved in the action_step{step_number}.png file. In addition, the clean screenshots are also saved when a sub-task, round or session is completed. The clean screenshots are saved in the action_round_{round_id}_sub_round_{sub_task_id}_final.png , action_round_{round_id}_final.png and action_step_final.png files, respectively. Below is an example of a clean screenshot.","title":"Clean Screenshots"},{"location":"logs/screenshots_logs/#annotation-screenshots","text":"UFO also saves annotated screenshots of the application, with each control item is annotated with a number, following the Set-of-Mark paradigm. The annotated screenshots are saved in the action_step{step_number}_annotated.png file. Below is an example of an annotated screenshot.","title":"Annotation Screenshots"},{"location":"logs/screenshots_logs/#concatenated-screenshots","text":"UFO also saves concatenated screenshots of the application, with clean and annotated screenshots concatenated side by side. The concatenated screenshots are saved in the action_step{step_number}_concat.png file. Below is an example of a concatenated screenshot.","title":"Concatenated Screenshots"},{"location":"logs/screenshots_logs/#selected-control-screenshots","text":"UFO saves screenshots of the selected control item for operation. The selected control screenshots are saved in the action_step{step_number}_selected_controls.png file. Below is an example of a selected control screenshot.","title":"Selected Control Screenshots"},{"location":"logs/step_logs/","text":"Step Logs The step log contains the agent's response to the user's request and additional information at every step. The step log is stored in the response.log file. The log fields are different for HostAgent and AppAgent . The step log is at the info level. HostAgent Logs The HostAgent logs contain the following fields: LLM Output Field Description Type Observation The observation of current desktop screenshots. String Thought The logical reasoning process of the HostAgent . String Current Sub-Task The current sub-task to be executed by the AppAgent . String Message The message to be sent to the AppAgent for the completion of the sub-task. String ControlLabel The index of the selected application to execute the sub-task. String ControlText The name of the selected application to execute the sub-task. String Plan The plan for the following sub-tasks after the current sub-task. List of Strings Status The status of the agent, mapped to the AgentState . String Comment Additional comments or information provided to the user. String Questions The questions to be asked to the user for additional information. List of Strings Bash The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. String Additional Information Field Description Type Step The step number of the session. Integer RoundStep The step number of the current round. Integer AgentStep The step number of the HostAgent . Integer Round The round number of the session. Integer ControlLabel The index of the selected application to execute the sub-task. Integer ControlText The name of the selected application to execute the sub-task. String Request The user request. String Agent The agent that executed the step, set to HostAgent . String AgentName The name of the agent. String Application The application process name. String Cost The cost of the step. Float Results The results of the step, set to an empty string. String CleanScreenshot The image path of the desktop screenshot. String AnnotatedScreenshot The image path of the annotated application screenshot. String ConcatScreenshot The image path of the concatenated application screenshot. String SelectedControlScreenshot The image path of the selected control screenshot. String time_cost The time cost of each step in the process. Dictionary AppAgent Logs The AppAgent logs contain the following fields: LLM Output Field Description Type Observation The observation of the current application screenshots. String Thought The logical reasoning process of the AppAgent . String ControlLabel The index of the selected control to interact with. String ControlText The name of the selected control to interact with. String Function The function to be executed on the selected control. String Args The arguments required for the function execution. List of Strings Status The status of the agent, mapped to the AgentState . String Plan The plan for the following steps after the current action. List of Strings Comment Additional comments or information provided to the user. String SaveScreenshot The flag to save the screenshot of the application to the blackboard for future reference. Boolean Additional Information Field Description Type Step The step number of the session. Integer RoundStep The step number of the current round. Integer AgentStep The step number of the AppAgent . Integer Round The round number of the session. Integer Subtask The sub-task to be executed by the AppAgent . String SubtaskIndex The index of the sub-task in the current round. Integer Action The action to be executed by the AppAgent . String ActionType The type of the action to be executed. String Request The user request. String Agent The agent that executed the step, set to AppAgent . String AgentName The name of the agent. String Application The application process name. String Cost The cost of the step. Float Results The results of the step. String CleanScreenshot The image path of the desktop screenshot. String AnnotatedScreenshot The image path of the annotated application screenshot. String ConcatScreenshot The image path of the concatenated application screenshot. String time_cost The time cost of each step in the process. Dictionary Tip You can use the following python code to read the request log: import json with open('logs/{task_name}/request.log', 'r') as f: for line in f: log = json.loads(line) Info The FollowerAgent logs share the same fields as the AppAgent logs.","title":"Step Logs"},{"location":"logs/step_logs/#step-logs","text":"The step log contains the agent's response to the user's request and additional information at every step. The step log is stored in the response.log file. The log fields are different for HostAgent and AppAgent . The step log is at the info level.","title":"Step Logs"},{"location":"logs/step_logs/#hostagent-logs","text":"The HostAgent logs contain the following fields:","title":"HostAgent Logs"},{"location":"logs/step_logs/#llm-output","text":"Field Description Type Observation The observation of current desktop screenshots. String Thought The logical reasoning process of the HostAgent . String Current Sub-Task The current sub-task to be executed by the AppAgent . String Message The message to be sent to the AppAgent for the completion of the sub-task. String ControlLabel The index of the selected application to execute the sub-task. String ControlText The name of the selected application to execute the sub-task. String Plan The plan for the following sub-tasks after the current sub-task. List of Strings Status The status of the agent, mapped to the AgentState . String Comment Additional comments or information provided to the user. String Questions The questions to be asked to the user for additional information. List of Strings Bash The bash command to be executed by the HostAgent . It can be used to open applications or execute system commands. String","title":"LLM Output"},{"location":"logs/step_logs/#additional-information","text":"Field Description Type Step The step number of the session. Integer RoundStep The step number of the current round. Integer AgentStep The step number of the HostAgent . Integer Round The round number of the session. Integer ControlLabel The index of the selected application to execute the sub-task. Integer ControlText The name of the selected application to execute the sub-task. String Request The user request. String Agent The agent that executed the step, set to HostAgent . String AgentName The name of the agent. String Application The application process name. String Cost The cost of the step. Float Results The results of the step, set to an empty string. String CleanScreenshot The image path of the desktop screenshot. String AnnotatedScreenshot The image path of the annotated application screenshot. String ConcatScreenshot The image path of the concatenated application screenshot. String SelectedControlScreenshot The image path of the selected control screenshot. String time_cost The time cost of each step in the process. Dictionary","title":"Additional Information"},{"location":"logs/step_logs/#appagent-logs","text":"The AppAgent logs contain the following fields:","title":"AppAgent Logs"},{"location":"logs/step_logs/#llm-output_1","text":"Field Description Type Observation The observation of the current application screenshots. String Thought The logical reasoning process of the AppAgent . String ControlLabel The index of the selected control to interact with. String ControlText The name of the selected control to interact with. String Function The function to be executed on the selected control. String Args The arguments required for the function execution. List of Strings Status The status of the agent, mapped to the AgentState . String Plan The plan for the following steps after the current action. List of Strings Comment Additional comments or information provided to the user. String SaveScreenshot The flag to save the screenshot of the application to the blackboard for future reference. Boolean","title":"LLM Output"},{"location":"logs/step_logs/#additional-information_1","text":"Field Description Type Step The step number of the session. Integer RoundStep The step number of the current round. Integer AgentStep The step number of the AppAgent . Integer Round The round number of the session. Integer Subtask The sub-task to be executed by the AppAgent . String SubtaskIndex The index of the sub-task in the current round. Integer Action The action to be executed by the AppAgent . String ActionType The type of the action to be executed. String Request The user request. String Agent The agent that executed the step, set to AppAgent . String AgentName The name of the agent. String Application The application process name. String Cost The cost of the step. Float Results The results of the step. String CleanScreenshot The image path of the desktop screenshot. String AnnotatedScreenshot The image path of the annotated application screenshot. String ConcatScreenshot The image path of the concatenated application screenshot. String time_cost The time cost of each step in the process. Dictionary Tip You can use the following python code to read the request log: import json with open('logs/{task_name}/request.log', 'r') as f: for line in f: log = json.loads(line) Info The FollowerAgent logs share the same fields as the AppAgent logs.","title":"Additional Information"},{"location":"logs/ui_tree_logs/","text":"UI Tree Logs UFO can save the entire UI tree of the application window at every step for data collection purposes. The UI tree can represent the application's UI structure, including the window, controls, and their properties. The UI tree logs are saved in the logs/{task_name}/ui_tree folder. You have to set the SAVE_UI_TREE flag to True in the config_dev.yaml file to enable the UI tree logs. Below is an example of the UI tree logs for application: { \"id\": \"node_0\", \"name\": \"Mail - Chaoyun Zhang - Outlook\", \"control_type\": \"Window\", \"rectangle\": { \"left\": 628, \"top\": 258, \"right\": 3508, \"bottom\": 1795 }, \"adjusted_rectangle\": { \"left\": 0, \"top\": 0, \"right\": 2880, \"bottom\": 1537 }, \"relative_rectangle\": { \"left\": 0.0, \"top\": 0.0, \"right\": 1.0, \"bottom\": 1.0 }, \"level\": 0, \"children\": [ { \"id\": \"node_1\", \"name\": \"\", \"control_type\": \"Pane\", \"rectangle\": { \"left\": 3282, \"top\": 258, \"right\": 3498, \"bottom\": 330 }, \"adjusted_rectangle\": { \"left\": 2654, \"top\": 0, \"right\": 2870, \"bottom\": 72 }, \"relative_rectangle\": { \"left\": 0.9215277777777777, \"top\": 0.0, \"right\": 0.9965277777777778, \"bottom\": 0.0468445022771633 }, \"level\": 1, \"children\": [] } ] } Fields in the UI tree logs Below is a table of the fields in the UI tree logs: Field Description Type id The unique identifier of the UI tree node. String name The name of the UI tree node. String control_type The type of the UI tree node. String rectangle The absolute position of the UI tree node. Dictionary adjusted_rectangle The adjusted position of the UI tree node. Dictionary relative_rectangle The relative position of the UI tree node. Dictionary level The level of the UI tree node. Integer children The children of the UI tree node. List of UI tree nodes Reference A class to represent the UI tree. Initialize the UI tree with the root element. Parameters: root ( UIAWrapper ) \u2013 The root element of the UI tree. Source code in automator/ui_control/ui_tree.py 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def __init__ ( self , root : UIAWrapper ): \"\"\" Initialize the UI tree with the root element. :param root: The root element of the UI tree. \"\"\" self . root = root # The node counter to count the number of nodes in the UI tree. self . node_counter = 0 try : self . _ui_tree = self . _get_ui_tree ( self . root ) except Exception as e : self . _ui_tree = { \"error\" : traceback . format_exc ()} ui_tree : Dict [ str , Any ] property The UI tree. apply_ui_tree_diff ( ui_tree_1 , diff ) staticmethod Apply a UI tree diff to ui_tree_1 to get ui_tree_2. Parameters: ui_tree_1 ( Dict [ str , Any ] ) \u2013 The original UI tree. diff ( Dict [ str , Any ] ) \u2013 The diff to apply. Returns: Dict [ str , Any ] \u2013 The new UI tree after applying the diff. Source code in automator/ui_control/ui_tree.py 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 @staticmethod def apply_ui_tree_diff ( ui_tree_1 : Dict [ str , Any ], diff : Dict [ str , Any ] ) -> Dict [ str , Any ]: \"\"\" Apply a UI tree diff to ui_tree_1 to get ui_tree_2. :param ui_tree_1: The original UI tree. :param diff: The diff to apply. :return: The new UI tree after applying the diff. \"\"\" ui_tree_2 = copy . deepcopy ( ui_tree_1 ) # Build an ID map for quick node lookups def build_id_map ( node , id_map ): id_map [ node [ \"id\" ]] = node for child in node . get ( \"children\" , []): build_id_map ( child , id_map ) id_map = {} if \"id\" in ui_tree_2 : build_id_map ( ui_tree_2 , id_map ) def remove_node_by_path ( path ): # The path is a list of IDs from root to target node. # The target node is the last element. Its parent is the second to last element. if len ( path ) == 1 : # Removing the root for k in list ( ui_tree_2 . keys ()): del ui_tree_2 [ k ] id_map . clear () return target_id = path [ - 1 ] parent_id = path [ - 2 ] parent_node = id_map [ parent_id ] # Find and remove the child with target_id for i , c in enumerate ( parent_node . get ( \"children\" , [])): if c [ \"id\" ] == target_id : parent_node [ \"children\" ] . pop ( i ) break # Remove target_id from id_map if target_id in id_map : del id_map [ target_id ] def add_node_by_path ( path , node ): # Add the node at the specified path. The parent is path[-2], the node is path[-1]. # The path[-1] should be node[\"id\"]. if len ( path ) == 1 : # Replacing the root node entirely for k in list ( ui_tree_2 . keys ()): del ui_tree_2 [ k ] for k , v in node . items (): ui_tree_2 [ k ] = v # Rebuild id_map id_map . clear () if \"id\" in ui_tree_2 : build_id_map ( ui_tree_2 , id_map ) return target_id = path [ - 1 ] parent_id = path [ - 2 ] parent_node = id_map [ parent_id ] # Ensure children list exists if \"children\" not in parent_node : parent_node [ \"children\" ] = [] # Insert or append the node # We don't have a numeric index anymore, we just append, assuming order doesn't matter. # If order matters, we must store ordering info or do some heuristic. parent_node [ \"children\" ] . append ( node ) # Update the id_map with the newly added subtree build_id_map ( node , id_map ) def modify_node_by_path ( path , changes ): # Modify fields of the node at the given ID target_id = path [ - 1 ] node = id_map [ target_id ] for field , ( old_val , new_val ) in changes . items (): node [ field ] = new_val # Apply removals first # Sort removals by length of path descending so we remove deeper nodes first. # This ensures we don't remove parents before children. for removal in sorted ( diff [ \"removed\" ], key = lambda x : len ( x [ \"path\" ]), reverse = True ): remove_node_by_path ( removal [ \"path\" ]) # Apply additions # Additions can be applied directly. for addition in diff [ \"added\" ]: add_node_by_path ( addition [ \"path\" ], addition [ \"node\" ]) # Apply modifications for modification in diff [ \"modified\" ]: modify_node_by_path ( modification [ \"path\" ], modification [ \"changes\" ]) return ui_tree_2 flatten_ui_tree () Flatten the UI tree into a list in width-first order. Source code in automator/ui_control/ui_tree.py 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 def flatten_ui_tree ( self ) -> List [ Dict [ str , Any ]]: \"\"\" Flatten the UI tree into a list in width-first order. \"\"\" def flatten_tree ( tree : Dict [ str , Any ], result : List [ Dict [ str , Any ]]): \"\"\" Flatten the tree. :param tree: The tree to flatten. :param result: The result list. \"\"\" tree_info = { \"name\" : tree [ \"name\" ], \"control_type\" : tree [ \"control_type\" ], \"rectangle\" : tree [ \"rectangle\" ], \"adjusted_rectangle\" : tree [ \"adjusted_rectangle\" ], \"relative_rectangle\" : tree [ \"relative_rectangle\" ], \"level\" : tree [ \"level\" ], } result . append ( tree_info ) for child in tree . get ( \"children\" , []): flatten_tree ( child , result ) result = [] flatten_tree ( self . ui_tree , result ) return result save_ui_tree_to_json ( file_path ) Save the UI tree to a JSON file. Parameters: file_path ( str ) \u2013 The file path to save the UI tree. Source code in automator/ui_control/ui_tree.py 103 104 105 106 107 108 109 110 111 112 113 114 115 def save_ui_tree_to_json ( self , file_path : str ) -> None : \"\"\" Save the UI tree to a JSON file. :param file_path: The file path to save the UI tree. \"\"\" # Check if the file directory exists. If not, create it. save_dir = os . path . dirname ( file_path ) if not os . path . exists ( save_dir ): os . makedirs ( save_dir ) with open ( file_path , \"w\" ) as file : json . dump ( self . ui_tree , file , indent = 4 ) ui_tree_diff ( ui_tree_1 , ui_tree_2 ) staticmethod Compute the difference between two UI trees. Parameters: ui_tree_1 ( Dict [ str , Any ] ) \u2013 The first UI tree. ui_tree_2 ( Dict [ str , Any ] ) \u2013 The second UI tree. Returns: \u2013 The difference between the two UI trees. Source code in automator/ui_control/ui_tree.py 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 @staticmethod def ui_tree_diff ( ui_tree_1 : Dict [ str , Any ], ui_tree_2 : Dict [ str , Any ]): \"\"\" Compute the difference between two UI trees. :param ui_tree_1: The first UI tree. :param ui_tree_2: The second UI tree. :return: The difference between the two UI trees. \"\"\" diff = { \"added\" : [], \"removed\" : [], \"modified\" : []} def compare_nodes ( node1 , node2 , path ): # Note: `path` is a list of IDs. The last element corresponds to the current node. # If node1 doesn't exist and node2 does, it's an addition. if node1 is None and node2 is not None : diff [ \"added\" ] . append ({ \"path\" : path , \"node\" : copy . deepcopy ( node2 )}) return # If node1 exists and node2 doesn't, it's a removal. if node1 is not None and node2 is None : diff [ \"removed\" ] . append ({ \"path\" : path , \"node\" : copy . deepcopy ( node1 )}) return # If both don't exist, nothing to do. if node1 is None and node2 is None : return # Both nodes exist, check for modifications at this node fields_to_compare = [ \"name\" , \"control_type\" , \"rectangle\" , \"adjusted_rectangle\" , \"relative_rectangle\" , \"level\" , ] changes = {} for field in fields_to_compare : if node1 [ field ] != node2 [ field ]: changes [ field ] = ( node1 [ field ], node2 [ field ]) if changes : diff [ \"modified\" ] . append ({ \"path\" : path , \"changes\" : changes }) # Compare children children1 = node1 . get ( \"children\" , []) children2 = node2 . get ( \"children\" , []) # We'll assume children order is stable. If not, differences will appear as adds/removes. max_len = max ( len ( children1 ), len ( children2 )) for i in range ( max_len ): c1 = children1 [ i ] if i < len ( children1 ) else None c2 = children2 [ i ] if i < len ( children2 ) else None # Use the child's id if available from c2 (prefer new tree), else from c1 if c2 is not None : child_id = c2 [ \"id\" ] elif c1 is not None : child_id = c1 [ \"id\" ] else : # Both None shouldn't happen since max_len ensures one must exist child_id = \"unknown_child_id\" compare_nodes ( c1 , c2 , path + [ child_id ]) # Initialize the path with the root node id if it exists if ui_tree_2 and \"id\" in ui_tree_2 : root_id = ui_tree_2 [ \"id\" ] elif ui_tree_1 and \"id\" in ui_tree_1 : root_id = ui_tree_1 [ \"id\" ] else : # If no root id is present, assume a placeholder root_id = \"root\" compare_nodes ( ui_tree_1 , ui_tree_2 , [ root_id ]) return diff Note Save the UI tree logs may increase the latency of the system. It is recommended to set the SAVE_UI_TREE flag to False when you do not need the UI tree logs.","title":"UI Tree"},{"location":"logs/ui_tree_logs/#ui-tree-logs","text":"UFO can save the entire UI tree of the application window at every step for data collection purposes. The UI tree can represent the application's UI structure, including the window, controls, and their properties. The UI tree logs are saved in the logs/{task_name}/ui_tree folder. You have to set the SAVE_UI_TREE flag to True in the config_dev.yaml file to enable the UI tree logs. Below is an example of the UI tree logs for application: { \"id\": \"node_0\", \"name\": \"Mail - Chaoyun Zhang - Outlook\", \"control_type\": \"Window\", \"rectangle\": { \"left\": 628, \"top\": 258, \"right\": 3508, \"bottom\": 1795 }, \"adjusted_rectangle\": { \"left\": 0, \"top\": 0, \"right\": 2880, \"bottom\": 1537 }, \"relative_rectangle\": { \"left\": 0.0, \"top\": 0.0, \"right\": 1.0, \"bottom\": 1.0 }, \"level\": 0, \"children\": [ { \"id\": \"node_1\", \"name\": \"\", \"control_type\": \"Pane\", \"rectangle\": { \"left\": 3282, \"top\": 258, \"right\": 3498, \"bottom\": 330 }, \"adjusted_rectangle\": { \"left\": 2654, \"top\": 0, \"right\": 2870, \"bottom\": 72 }, \"relative_rectangle\": { \"left\": 0.9215277777777777, \"top\": 0.0, \"right\": 0.9965277777777778, \"bottom\": 0.0468445022771633 }, \"level\": 1, \"children\": [] } ] }","title":"UI Tree Logs"},{"location":"logs/ui_tree_logs/#fields-in-the-ui-tree-logs","text":"Below is a table of the fields in the UI tree logs: Field Description Type id The unique identifier of the UI tree node. String name The name of the UI tree node. String control_type The type of the UI tree node. String rectangle The absolute position of the UI tree node. Dictionary adjusted_rectangle The adjusted position of the UI tree node. Dictionary relative_rectangle The relative position of the UI tree node. Dictionary level The level of the UI tree node. Integer children The children of the UI tree node. List of UI tree nodes","title":"Fields in the UI tree logs"},{"location":"logs/ui_tree_logs/#reference","text":"A class to represent the UI tree. Initialize the UI tree with the root element. Parameters: root ( UIAWrapper ) \u2013 The root element of the UI tree. Source code in automator/ui_control/ui_tree.py 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def __init__ ( self , root : UIAWrapper ): \"\"\" Initialize the UI tree with the root element. :param root: The root element of the UI tree. \"\"\" self . root = root # The node counter to count the number of nodes in the UI tree. self . node_counter = 0 try : self . _ui_tree = self . _get_ui_tree ( self . root ) except Exception as e : self . _ui_tree = { \"error\" : traceback . format_exc ()}","title":"Reference"},{"location":"logs/ui_tree_logs/#automator.ui_control.ui_tree.UITree.ui_tree","text":"The UI tree.","title":"ui_tree"},{"location":"logs/ui_tree_logs/#automator.ui_control.ui_tree.UITree.apply_ui_tree_diff","text":"Apply a UI tree diff to ui_tree_1 to get ui_tree_2. Parameters: ui_tree_1 ( Dict [ str , Any ] ) \u2013 The original UI tree. diff ( Dict [ str , Any ] ) \u2013 The diff to apply. Returns: Dict [ str , Any ] \u2013 The new UI tree after applying the diff. Source code in automator/ui_control/ui_tree.py 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 @staticmethod def apply_ui_tree_diff ( ui_tree_1 : Dict [ str , Any ], diff : Dict [ str , Any ] ) -> Dict [ str , Any ]: \"\"\" Apply a UI tree diff to ui_tree_1 to get ui_tree_2. :param ui_tree_1: The original UI tree. :param diff: The diff to apply. :return: The new UI tree after applying the diff. \"\"\" ui_tree_2 = copy . deepcopy ( ui_tree_1 ) # Build an ID map for quick node lookups def build_id_map ( node , id_map ): id_map [ node [ \"id\" ]] = node for child in node . get ( \"children\" , []): build_id_map ( child , id_map ) id_map = {} if \"id\" in ui_tree_2 : build_id_map ( ui_tree_2 , id_map ) def remove_node_by_path ( path ): # The path is a list of IDs from root to target node. # The target node is the last element. Its parent is the second to last element. if len ( path ) == 1 : # Removing the root for k in list ( ui_tree_2 . keys ()): del ui_tree_2 [ k ] id_map . clear () return target_id = path [ - 1 ] parent_id = path [ - 2 ] parent_node = id_map [ parent_id ] # Find and remove the child with target_id for i , c in enumerate ( parent_node . get ( \"children\" , [])): if c [ \"id\" ] == target_id : parent_node [ \"children\" ] . pop ( i ) break # Remove target_id from id_map if target_id in id_map : del id_map [ target_id ] def add_node_by_path ( path , node ): # Add the node at the specified path. The parent is path[-2], the node is path[-1]. # The path[-1] should be node[\"id\"]. if len ( path ) == 1 : # Replacing the root node entirely for k in list ( ui_tree_2 . keys ()): del ui_tree_2 [ k ] for k , v in node . items (): ui_tree_2 [ k ] = v # Rebuild id_map id_map . clear () if \"id\" in ui_tree_2 : build_id_map ( ui_tree_2 , id_map ) return target_id = path [ - 1 ] parent_id = path [ - 2 ] parent_node = id_map [ parent_id ] # Ensure children list exists if \"children\" not in parent_node : parent_node [ \"children\" ] = [] # Insert or append the node # We don't have a numeric index anymore, we just append, assuming order doesn't matter. # If order matters, we must store ordering info or do some heuristic. parent_node [ \"children\" ] . append ( node ) # Update the id_map with the newly added subtree build_id_map ( node , id_map ) def modify_node_by_path ( path , changes ): # Modify fields of the node at the given ID target_id = path [ - 1 ] node = id_map [ target_id ] for field , ( old_val , new_val ) in changes . items (): node [ field ] = new_val # Apply removals first # Sort removals by length of path descending so we remove deeper nodes first. # This ensures we don't remove parents before children. for removal in sorted ( diff [ \"removed\" ], key = lambda x : len ( x [ \"path\" ]), reverse = True ): remove_node_by_path ( removal [ \"path\" ]) # Apply additions # Additions can be applied directly. for addition in diff [ \"added\" ]: add_node_by_path ( addition [ \"path\" ], addition [ \"node\" ]) # Apply modifications for modification in diff [ \"modified\" ]: modify_node_by_path ( modification [ \"path\" ], modification [ \"changes\" ]) return ui_tree_2","title":"apply_ui_tree_diff"},{"location":"logs/ui_tree_logs/#automator.ui_control.ui_tree.UITree.flatten_ui_tree","text":"Flatten the UI tree into a list in width-first order. Source code in automator/ui_control/ui_tree.py 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 def flatten_ui_tree ( self ) -> List [ Dict [ str , Any ]]: \"\"\" Flatten the UI tree into a list in width-first order. \"\"\" def flatten_tree ( tree : Dict [ str , Any ], result : List [ Dict [ str , Any ]]): \"\"\" Flatten the tree. :param tree: The tree to flatten. :param result: The result list. \"\"\" tree_info = { \"name\" : tree [ \"name\" ], \"control_type\" : tree [ \"control_type\" ], \"rectangle\" : tree [ \"rectangle\" ], \"adjusted_rectangle\" : tree [ \"adjusted_rectangle\" ], \"relative_rectangle\" : tree [ \"relative_rectangle\" ], \"level\" : tree [ \"level\" ], } result . append ( tree_info ) for child in tree . get ( \"children\" , []): flatten_tree ( child , result ) result = [] flatten_tree ( self . ui_tree , result ) return result","title":"flatten_ui_tree"},{"location":"logs/ui_tree_logs/#automator.ui_control.ui_tree.UITree.save_ui_tree_to_json","text":"Save the UI tree to a JSON file. Parameters: file_path ( str ) \u2013 The file path to save the UI tree. Source code in automator/ui_control/ui_tree.py 103 104 105 106 107 108 109 110 111 112 113 114 115 def save_ui_tree_to_json ( self , file_path : str ) -> None : \"\"\" Save the UI tree to a JSON file. :param file_path: The file path to save the UI tree. \"\"\" # Check if the file directory exists. If not, create it. save_dir = os . path . dirname ( file_path ) if not os . path . exists ( save_dir ): os . makedirs ( save_dir ) with open ( file_path , \"w\" ) as file : json . dump ( self . ui_tree , file , indent = 4 )","title":"save_ui_tree_to_json"},{"location":"logs/ui_tree_logs/#automator.ui_control.ui_tree.UITree.ui_tree_diff","text":"Compute the difference between two UI trees. Parameters: ui_tree_1 ( Dict [ str , Any ] ) \u2013 The first UI tree. ui_tree_2 ( Dict [ str , Any ] ) \u2013 The second UI tree. Returns: \u2013 The difference between the two UI trees. Source code in automator/ui_control/ui_tree.py 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 @staticmethod def ui_tree_diff ( ui_tree_1 : Dict [ str , Any ], ui_tree_2 : Dict [ str , Any ]): \"\"\" Compute the difference between two UI trees. :param ui_tree_1: The first UI tree. :param ui_tree_2: The second UI tree. :return: The difference between the two UI trees. \"\"\" diff = { \"added\" : [], \"removed\" : [], \"modified\" : []} def compare_nodes ( node1 , node2 , path ): # Note: `path` is a list of IDs. The last element corresponds to the current node. # If node1 doesn't exist and node2 does, it's an addition. if node1 is None and node2 is not None : diff [ \"added\" ] . append ({ \"path\" : path , \"node\" : copy . deepcopy ( node2 )}) return # If node1 exists and node2 doesn't, it's a removal. if node1 is not None and node2 is None : diff [ \"removed\" ] . append ({ \"path\" : path , \"node\" : copy . deepcopy ( node1 )}) return # If both don't exist, nothing to do. if node1 is None and node2 is None : return # Both nodes exist, check for modifications at this node fields_to_compare = [ \"name\" , \"control_type\" , \"rectangle\" , \"adjusted_rectangle\" , \"relative_rectangle\" , \"level\" , ] changes = {} for field in fields_to_compare : if node1 [ field ] != node2 [ field ]: changes [ field ] = ( node1 [ field ], node2 [ field ]) if changes : diff [ \"modified\" ] . append ({ \"path\" : path , \"changes\" : changes }) # Compare children children1 = node1 . get ( \"children\" , []) children2 = node2 . get ( \"children\" , []) # We'll assume children order is stable. If not, differences will appear as adds/removes. max_len = max ( len ( children1 ), len ( children2 )) for i in range ( max_len ): c1 = children1 [ i ] if i < len ( children1 ) else None c2 = children2 [ i ] if i < len ( children2 ) else None # Use the child's id if available from c2 (prefer new tree), else from c1 if c2 is not None : child_id = c2 [ \"id\" ] elif c1 is not None : child_id = c1 [ \"id\" ] else : # Both None shouldn't happen since max_len ensures one must exist child_id = \"unknown_child_id\" compare_nodes ( c1 , c2 , path + [ child_id ]) # Initialize the path with the root node id if it exists if ui_tree_2 and \"id\" in ui_tree_2 : root_id = ui_tree_2 [ \"id\" ] elif ui_tree_1 and \"id\" in ui_tree_1 : root_id = ui_tree_1 [ \"id\" ] else : # If no root id is present, assume a placeholder root_id = \"root\" compare_nodes ( ui_tree_1 , ui_tree_2 , [ root_id ]) return diff Note Save the UI tree logs may increase the latency of the system. It is recommended to set the SAVE_UI_TREE flag to False when you do not need the UI tree logs.","title":"ui_tree_diff"},{"location":"modules/context/","text":"Context The Context object is a shared state object that stores the state of the conversation across all Rounds within a Session . It is used to maintain the context of the conversation, as well as the overall status of the conversation. Context Attributes The attributes of the Context object are defined in the ContextNames class, which is an Enum . The ContextNames class specifies various context attributes used throughout the session. Below is the definition: class ContextNames(Enum): \"\"\" The context names. \"\"\" ID = \"ID\" # The ID of the session MODE = \"MODE\" # The mode of the session LOG_PATH = \"LOG_PATH\" # The folder path to store the logs REQUEST = \"REQUEST\" # The current request SUBTASK = \"SUBTASK\" # The current subtask processed by the AppAgent PREVIOUS_SUBTASKS = \"PREVIOUS_SUBTASKS\" # The previous subtasks processed by the AppAgent HOST_MESSAGE = \"HOST_MESSAGE\" # The message from the HostAgent sent to the AppAgent REQUEST_LOGGER = \"REQUEST_LOGGER\" # The logger for the LLM request LOGGER = \"LOGGER\" # The logger for the session EVALUATION_LOGGER = \"EVALUATION_LOGGER\" # The logger for the evaluation ROUND_STEP = \"ROUND_STEP\" # The step of all rounds SESSION_STEP = \"SESSION_STEP\" # The step of the current session CURRENT_ROUND_ID = \"CURRENT_ROUND_ID\" # The ID of the current round APPLICATION_WINDOW = \"APPLICATION_WINDOW\" # The window of the application APPLICATION_PROCESS_NAME = \"APPLICATION_PROCESS_NAME\" # The process name of the application APPLICATION_ROOT_NAME = \"APPLICATION_ROOT_NAME\" # The root name of the application CONTROL_REANNOTATION = \"CONTROL_REANNOTATION\" # The re-annotation of the control provided by the AppAgent SESSION_COST = \"SESSION_COST\" # The cost of the session ROUND_COST = \"ROUND_COST\" # The cost of all rounds ROUND_SUBTASK_AMOUNT = \"ROUND_SUBTASK_AMOUNT\" # The amount of subtasks in all rounds CURRENT_ROUND_STEP = \"CURRENT_ROUND_STEP\" # The step of the current round CURRENT_ROUND_COST = \"CURRENT_ROUND_COST\" # The cost of the current round CURRENT_ROUND_SUBTASK_AMOUNT = \"CURRENT_ROUND_SUBTASK_AMOUNT\" # The amount of subtasks in the current round STRUCTURAL_LOGS = \"STRUCTURAL_LOGS\" # The structural logs of the session Each attribute is a string that represents a specific aspect of the session context, ensuring that all necessary information is accessible and manageable within the application. Attributes Description Attribute Description ID The ID of the session. MODE The mode of the session. LOG_PATH The folder path to store the logs. REQUEST The current request. SUBTASK The current subtask processed by the AppAgent. PREVIOUS_SUBTASKS The previous subtasks processed by the AppAgent. HOST_MESSAGE The message from the HostAgent sent to the AppAgent. REQUEST_LOGGER The logger for the LLM request. LOGGER The logger for the session. EVALUATION_LOGGER The logger for the evaluation. ROUND_STEP The step of all rounds. SESSION_STEP The step of the current session. CURRENT_ROUND_ID The ID of the current round. APPLICATION_WINDOW The window of the application. APPLICATION_PROCESS_NAME The process name of the application. APPLICATION_ROOT_NAME The root name of the application. CONTROL_REANNOTATION The re-annotation of the control provided by the AppAgent. SESSION_COST The cost of the session. ROUND_COST The cost of all rounds. ROUND_SUBTASK_AMOUNT The amount of subtasks in all rounds. CURRENT_ROUND_STEP The step of the current round. CURRENT_ROUND_COST The cost of the current round. CURRENT_ROUND_SUBTASK_AMOUNT The amount of subtasks in the current round. STRUCTURAL_LOGS The structural logs of the session. Reference for the Context object The context class that maintains the context for the session and agent. current_round_cost : Optional [ float ] property writable Get the current round cost. current_round_step : int property writable Get the current round step. current_round_subtask_amount : int property writable Get the current round subtask index. add_to_structural_logs ( data ) Add data to the structural logs. Parameters: data ( Dict [ str , Any ] ) \u2013 The data to add to the structural logs. Source code in module/context.py 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 def add_to_structural_logs ( self , data : Dict [ str , Any ]) -> None : \"\"\" Add data to the structural logs. :param data: The data to add to the structural logs. \"\"\" round_key = data . get ( \"Round\" , None ) subtask_key = data . get ( \"SubtaskIndex\" , None ) if round_key is None or subtask_key is None : return remaining_items = { key : data [ key ] for key in data if key not in [ \"a\" , \"b\" ]} self . _context [ ContextNames . STRUCTURAL_LOGS . name ][ round_key ][ subtask_key ] . append ( remaining_items ) filter_structural_logs ( round_key , subtask_key , keys ) Filter the structural logs. Parameters: round_key ( int ) \u2013 The round key. subtask_key ( int ) \u2013 The subtask key. keys ( Union [ str , List [ str ]] ) \u2013 The keys to filter. Returns: Union [ List [ Any ], List [ Dict [ str , Any ]]] \u2013 The filtered structural logs. Source code in module/context.py 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 def filter_structural_logs ( self , round_key : int , subtask_key : int , keys : Union [ str , List [ str ]] ) -> Union [ List [ Any ], List [ Dict [ str , Any ]]]: \"\"\" Filter the structural logs. :param round_key: The round key. :param subtask_key: The subtask key. :param keys: The keys to filter. :return: The filtered structural logs. \"\"\" structural_logs = self . _context [ ContextNames . STRUCTURAL_LOGS . name ][ round_key ][ subtask_key ] if isinstance ( keys , str ): return [ log [ keys ] for log in structural_logs ] elif isinstance ( keys , list ): return [{ key : log [ key ] for key in keys } for log in structural_logs ] else : raise TypeError ( f \"Keys should be a string or a list of strings.\" ) get ( key ) Get the value from the context. Parameters: key ( ContextNames ) \u2013 The context name. Returns: Any \u2013 The value from the context. Source code in module/context.py 165 166 167 168 169 170 171 172 173 def get ( self , key : ContextNames ) -> Any : \"\"\" Get the value from the context. :param key: The context name. :return: The value from the context. \"\"\" # Sync the current round step and cost self . _sync_round_values () return self . _context . get ( key . name ) set ( key , value ) Set the value in the context. Parameters: key ( ContextNames ) \u2013 The context name. value ( Any ) \u2013 The value to set in the context. Source code in module/context.py 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 def set ( self , key : ContextNames , value : Any ) -> None : \"\"\" Set the value in the context. :param key: The context name. :param value: The value to set in the context. \"\"\" if key . name in self . _context : self . _context [ key . name ] = value # Sync the current round step and cost if key == ContextNames . CURRENT_ROUND_STEP : self . current_round_step = value if key == ContextNames . CURRENT_ROUND_COST : self . current_round_cost = value if key == ContextNames . CURRENT_ROUND_SUBTASK_AMOUNT : self . current_round_subtask_amount = value else : raise KeyError ( f \"Key ' { key } ' is not a valid context name.\" ) to_dict () Convert the context to a dictionary. Returns: Dict [ str , Any ] \u2013 The dictionary of the context. Source code in module/context.py 313 314 315 316 317 318 def to_dict ( self ) -> Dict [ str , Any ]: \"\"\" Convert the context to a dictionary. :return: The dictionary of the context. \"\"\" return self . _context update_dict ( key , value ) Add a dictionary to a context key. The value and the context key should be dictionaries. Parameters: key ( ContextNames ) \u2013 The context key to update. value ( Dict [ str , Any ] ) \u2013 The dictionary to add to the context key. Source code in module/context.py 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 def update_dict ( self , key : ContextNames , value : Dict [ str , Any ]) -> None : \"\"\" Add a dictionary to a context key. The value and the context key should be dictionaries. :param key: The context key to update. :param value: The dictionary to add to the context key. \"\"\" if key . name in self . _context : context_value = self . _context [ key . name ] if isinstance ( value , dict ) and isinstance ( context_value , dict ): self . _context [ key . name ] . update ( value ) else : raise TypeError ( f \"Value for key ' { key . name } ' is { key . value } , requires a dictionary.\" ) else : raise KeyError ( f \"Key ' { key . name } ' is not a valid context name.\" )","title":"Context"},{"location":"modules/context/#context","text":"The Context object is a shared state object that stores the state of the conversation across all Rounds within a Session . It is used to maintain the context of the conversation, as well as the overall status of the conversation.","title":"Context"},{"location":"modules/context/#context-attributes","text":"The attributes of the Context object are defined in the ContextNames class, which is an Enum . The ContextNames class specifies various context attributes used throughout the session. Below is the definition: class ContextNames(Enum): \"\"\" The context names. \"\"\" ID = \"ID\" # The ID of the session MODE = \"MODE\" # The mode of the session LOG_PATH = \"LOG_PATH\" # The folder path to store the logs REQUEST = \"REQUEST\" # The current request SUBTASK = \"SUBTASK\" # The current subtask processed by the AppAgent PREVIOUS_SUBTASKS = \"PREVIOUS_SUBTASKS\" # The previous subtasks processed by the AppAgent HOST_MESSAGE = \"HOST_MESSAGE\" # The message from the HostAgent sent to the AppAgent REQUEST_LOGGER = \"REQUEST_LOGGER\" # The logger for the LLM request LOGGER = \"LOGGER\" # The logger for the session EVALUATION_LOGGER = \"EVALUATION_LOGGER\" # The logger for the evaluation ROUND_STEP = \"ROUND_STEP\" # The step of all rounds SESSION_STEP = \"SESSION_STEP\" # The step of the current session CURRENT_ROUND_ID = \"CURRENT_ROUND_ID\" # The ID of the current round APPLICATION_WINDOW = \"APPLICATION_WINDOW\" # The window of the application APPLICATION_PROCESS_NAME = \"APPLICATION_PROCESS_NAME\" # The process name of the application APPLICATION_ROOT_NAME = \"APPLICATION_ROOT_NAME\" # The root name of the application CONTROL_REANNOTATION = \"CONTROL_REANNOTATION\" # The re-annotation of the control provided by the AppAgent SESSION_COST = \"SESSION_COST\" # The cost of the session ROUND_COST = \"ROUND_COST\" # The cost of all rounds ROUND_SUBTASK_AMOUNT = \"ROUND_SUBTASK_AMOUNT\" # The amount of subtasks in all rounds CURRENT_ROUND_STEP = \"CURRENT_ROUND_STEP\" # The step of the current round CURRENT_ROUND_COST = \"CURRENT_ROUND_COST\" # The cost of the current round CURRENT_ROUND_SUBTASK_AMOUNT = \"CURRENT_ROUND_SUBTASK_AMOUNT\" # The amount of subtasks in the current round STRUCTURAL_LOGS = \"STRUCTURAL_LOGS\" # The structural logs of the session Each attribute is a string that represents a specific aspect of the session context, ensuring that all necessary information is accessible and manageable within the application.","title":"Context Attributes"},{"location":"modules/context/#attributes-description","text":"Attribute Description ID The ID of the session. MODE The mode of the session. LOG_PATH The folder path to store the logs. REQUEST The current request. SUBTASK The current subtask processed by the AppAgent. PREVIOUS_SUBTASKS The previous subtasks processed by the AppAgent. HOST_MESSAGE The message from the HostAgent sent to the AppAgent. REQUEST_LOGGER The logger for the LLM request. LOGGER The logger for the session. EVALUATION_LOGGER The logger for the evaluation. ROUND_STEP The step of all rounds. SESSION_STEP The step of the current session. CURRENT_ROUND_ID The ID of the current round. APPLICATION_WINDOW The window of the application. APPLICATION_PROCESS_NAME The process name of the application. APPLICATION_ROOT_NAME The root name of the application. CONTROL_REANNOTATION The re-annotation of the control provided by the AppAgent. SESSION_COST The cost of the session. ROUND_COST The cost of all rounds. ROUND_SUBTASK_AMOUNT The amount of subtasks in all rounds. CURRENT_ROUND_STEP The step of the current round. CURRENT_ROUND_COST The cost of the current round. CURRENT_ROUND_SUBTASK_AMOUNT The amount of subtasks in the current round. STRUCTURAL_LOGS The structural logs of the session.","title":"Attributes Description"},{"location":"modules/context/#reference-for-the-context-object","text":"The context class that maintains the context for the session and agent.","title":"Reference for the Context object"},{"location":"modules/context/#module.context.Context.current_round_cost","text":"Get the current round cost.","title":"current_round_cost"},{"location":"modules/context/#module.context.Context.current_round_step","text":"Get the current round step.","title":"current_round_step"},{"location":"modules/context/#module.context.Context.current_round_subtask_amount","text":"Get the current round subtask index.","title":"current_round_subtask_amount"},{"location":"modules/context/#module.context.Context.add_to_structural_logs","text":"Add data to the structural logs. Parameters: data ( Dict [ str , Any ] ) \u2013 The data to add to the structural logs. Source code in module/context.py 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 def add_to_structural_logs ( self , data : Dict [ str , Any ]) -> None : \"\"\" Add data to the structural logs. :param data: The data to add to the structural logs. \"\"\" round_key = data . get ( \"Round\" , None ) subtask_key = data . get ( \"SubtaskIndex\" , None ) if round_key is None or subtask_key is None : return remaining_items = { key : data [ key ] for key in data if key not in [ \"a\" , \"b\" ]} self . _context [ ContextNames . STRUCTURAL_LOGS . name ][ round_key ][ subtask_key ] . append ( remaining_items )","title":"add_to_structural_logs"},{"location":"modules/context/#module.context.Context.filter_structural_logs","text":"Filter the structural logs. Parameters: round_key ( int ) \u2013 The round key. subtask_key ( int ) \u2013 The subtask key. keys ( Union [ str , List [ str ]] ) \u2013 The keys to filter. Returns: Union [ List [ Any ], List [ Dict [ str , Any ]]] \u2013 The filtered structural logs. Source code in module/context.py 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 def filter_structural_logs ( self , round_key : int , subtask_key : int , keys : Union [ str , List [ str ]] ) -> Union [ List [ Any ], List [ Dict [ str , Any ]]]: \"\"\" Filter the structural logs. :param round_key: The round key. :param subtask_key: The subtask key. :param keys: The keys to filter. :return: The filtered structural logs. \"\"\" structural_logs = self . _context [ ContextNames . STRUCTURAL_LOGS . name ][ round_key ][ subtask_key ] if isinstance ( keys , str ): return [ log [ keys ] for log in structural_logs ] elif isinstance ( keys , list ): return [{ key : log [ key ] for key in keys } for log in structural_logs ] else : raise TypeError ( f \"Keys should be a string or a list of strings.\" )","title":"filter_structural_logs"},{"location":"modules/context/#module.context.Context.get","text":"Get the value from the context. Parameters: key ( ContextNames ) \u2013 The context name. Returns: Any \u2013 The value from the context. Source code in module/context.py 165 166 167 168 169 170 171 172 173 def get ( self , key : ContextNames ) -> Any : \"\"\" Get the value from the context. :param key: The context name. :return: The value from the context. \"\"\" # Sync the current round step and cost self . _sync_round_values () return self . _context . get ( key . name )","title":"get"},{"location":"modules/context/#module.context.Context.set","text":"Set the value in the context. Parameters: key ( ContextNames ) \u2013 The context name. value ( Any ) \u2013 The value to set in the context. Source code in module/context.py 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 def set ( self , key : ContextNames , value : Any ) -> None : \"\"\" Set the value in the context. :param key: The context name. :param value: The value to set in the context. \"\"\" if key . name in self . _context : self . _context [ key . name ] = value # Sync the current round step and cost if key == ContextNames . CURRENT_ROUND_STEP : self . current_round_step = value if key == ContextNames . CURRENT_ROUND_COST : self . current_round_cost = value if key == ContextNames . CURRENT_ROUND_SUBTASK_AMOUNT : self . current_round_subtask_amount = value else : raise KeyError ( f \"Key ' { key } ' is not a valid context name.\" )","title":"set"},{"location":"modules/context/#module.context.Context.to_dict","text":"Convert the context to a dictionary. Returns: Dict [ str , Any ] \u2013 The dictionary of the context. Source code in module/context.py 313 314 315 316 317 318 def to_dict ( self ) -> Dict [ str , Any ]: \"\"\" Convert the context to a dictionary. :return: The dictionary of the context. \"\"\" return self . _context","title":"to_dict"},{"location":"modules/context/#module.context.Context.update_dict","text":"Add a dictionary to a context key. The value and the context key should be dictionaries. Parameters: key ( ContextNames ) \u2013 The context key to update. value ( Dict [ str , Any ] ) \u2013 The dictionary to add to the context key. Source code in module/context.py 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 def update_dict ( self , key : ContextNames , value : Dict [ str , Any ]) -> None : \"\"\" Add a dictionary to a context key. The value and the context key should be dictionaries. :param key: The context key to update. :param value: The dictionary to add to the context key. \"\"\" if key . name in self . _context : context_value = self . _context [ key . name ] if isinstance ( value , dict ) and isinstance ( context_value , dict ): self . _context [ key . name ] . update ( value ) else : raise TypeError ( f \"Value for key ' { key . name } ' is { key . value } , requires a dictionary.\" ) else : raise KeyError ( f \"Key ' { key . name } ' is not a valid context name.\" )","title":"update_dict"},{"location":"modules/round/","text":"Round A Round is a single interaction between the user and UFO that processes a single user request. A Round is responsible for orchestrating the HostAgent and AppAgent to fulfill the user's request. Round Lifecycle In a Round , the following steps are executed: 1. Round Initialization At the beginning of a Round , the Round object is created, and the user's request is processed by the HostAgent to determine the appropriate application to fulfill the request. 2. Action Execution Once created, the Round orchestrates the HostAgent and AppAgent to execute the necessary actions to fulfill the user's request. The core logic of a Round is shown below: def run(self) -> None: \"\"\" Run the round. \"\"\" while not self.is_finished(): self.agent.handle(self.context) self.state = self.agent.state.next_state(self.agent) self.agent = self.agent.state.next_agent(self.agent) self.agent.set_state(self.state) # If the subtask ends, capture the last snapshot of the application. if self.state.is_subtask_end(): time.sleep(configs[\"SLEEP_TIME\"]) self.capture_last_snapshot(sub_round_id=self.subtask_amount) self.subtask_amount += 1 self.agent.blackboard.add_requests( {\"request_{i}\".format(i=self.id), self.request} ) if self.application_window is not None: self.capture_last_snapshot() if self._should_evaluate: self.evaluation() At each step, the Round processes the user's request by invoking the handle method of the AppAgent or HostAgent based on the current state. The state determines the next agent to handle the request and the next state to transition to. 3. Request Completion The AppAgent completes the actions within the application. If the request spans multiple applications, the HostAgent may switch to a different application to continue the task. 4. Round Termination Once the user's request is fulfilled, the Round is terminated, and the results are returned to the user. If configured, the EvaluationAgent evaluates the completeness of the Round . Reference Bases: ABC A round of a session in UFO. A round manages a single user request and consists of multiple steps. A session may consists of multiple rounds of interactions. Initialize a round. Parameters: request ( str ) \u2013 The request of the round. agent ( BasicAgent ) \u2013 The initial agent of the round. context ( Context ) \u2013 The shared context of the round. should_evaluate ( bool ) \u2013 Whether to evaluate the round. id ( int ) \u2013 The id of the round. Source code in module/basic.py 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 def __init__ ( self , request : str , agent : BasicAgent , context : Context , should_evaluate : bool , id : int , ) -> None : \"\"\" Initialize a round. :param request: The request of the round. :param agent: The initial agent of the round. :param context: The shared context of the round. :param should_evaluate: Whether to evaluate the round. :param id: The id of the round. \"\"\" self . _request = request self . _context = context self . _agent = agent self . _state = agent . state self . _id = id self . _should_evaluate = should_evaluate self . _init_context () agent : BasicAgent property writable Get the agent of the round. return: The agent of the round. application_window : UIAWrapper property writable Get the application of the session. return: The application of the session. context : Context property Get the context of the round. return: The context of the round. cost : float property Get the cost of the round. return: The cost of the round. id : int property Get the id of the round. return: The id of the round. log_path : str property Get the log path of the round. return: The log path of the round. request : str property Get the request of the round. return: The request of the round. state : AgentState property writable Get the status of the round. return: The status of the round. step : int property Get the local step of the round. return: The step of the round. subtask_amount : int property writable Get the subtask amount of the round. return: The subtask amount of the round. capture_last_snapshot ( sub_round_id = None ) Capture the last snapshot of the application, including the screenshot and the XML file if configured. Parameters: sub_round_id ( Optional [ int ] , default: None ) \u2013 The id of the sub-round, default is None. Source code in module/basic.py 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 def capture_last_snapshot ( self , sub_round_id : Optional [ int ] = None ) -> None : \"\"\" Capture the last snapshot of the application, including the screenshot and the XML file if configured. :param sub_round_id: The id of the sub-round, default is None. \"\"\" # Capture the final screenshot if sub_round_id is None : screenshot_save_path = self . log_path + f \"action_round_ { self . id } _final.png\" else : screenshot_save_path = ( self . log_path + f \"action_round_ { self . id } _sub_round_ { sub_round_id } _final.png\" ) if self . application_window is not None : try : PhotographerFacade () . capture_app_window_screenshot ( self . application_window , save_path = screenshot_save_path ) except Exception as e : utils . print_with_color ( f \"Warning: The last snapshot capture failed, due to the error: { e } \" , \"yellow\" , ) if configs . get ( \"SAVE_UI_TREE\" , False ): step_ui_tree = ui_tree . UITree ( self . application_window ) ui_tree_path = os . path . join ( self . log_path , \"ui_trees\" ) ui_tree_file_name = ( f \"ui_tree_round_ { self . id } _final.json\" if sub_round_id is None else f \"ui_tree_round_ { self . id } _sub_round_ { sub_round_id } _final.json\" ) step_ui_tree . save_ui_tree_to_json ( os . path . join ( ui_tree_path , ui_tree_file_name , ) ) # Save the final XML file if configs [ \"LOG_XML\" ]: log_abs_path = os . path . abspath ( self . log_path ) xml_save_path = os . path . join ( log_abs_path , ( f \"xml/action_round_ { self . id } _final.xml\" if sub_round_id is None else f \"xml/action_round_ { self . id } _sub_round_ { sub_round_id } _final.xml\" ), ) if issubclass ( type ( self . agent ), HostAgent ): app_agent : AppAgent = self . agent . get_active_appagent () app_agent . Puppeteer . save_to_xml ( xml_save_path ) elif issubclass ( type ( self . agent ), AppAgent ): app_agent : AppAgent = self . agent app_agent . Puppeteer . save_to_xml ( xml_save_path ) evaluation () TODO: Evaluate the round. Source code in module/basic.py 312 313 314 315 316 def evaluation ( self ) -> None : \"\"\" TODO: Evaluate the round. \"\"\" pass is_finished () Check if the round is finished. return: True if the round is finished, otherwise False. Source code in module/basic.py 127 128 129 130 131 132 133 134 135 def is_finished ( self ) -> bool : \"\"\" Check if the round is finished. return: True if the round is finished, otherwise False. \"\"\" return ( self . state . is_round_end () or self . context . get ( ContextNames . SESSION_STEP ) >= configs [ \"MAX_STEP\" ] ) print_cost () Print the total cost of the round. Source code in module/basic.py 225 226 227 228 229 230 231 232 233 234 235 def print_cost ( self ) -> None : \"\"\" Print the total cost of the round. \"\"\" total_cost = self . cost if isinstance ( total_cost , float ): formatted_cost = \"$ {:.2f} \" . format ( total_cost ) utils . print_with_color ( f \"Request total cost for current round is { formatted_cost } \" , \"yellow\" ) run () Run the round. Source code in module/basic.py 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 def run ( self ) -> None : \"\"\" Run the round. \"\"\" while not self . is_finished (): self . agent . handle ( self . context ) self . state = self . agent . state . next_state ( self . agent ) self . agent = self . agent . state . next_agent ( self . agent ) self . agent . set_state ( self . state ) # If the subtask ends, capture the last snapshot of the application. if self . state . is_subtask_end (): time . sleep ( configs [ \"SLEEP_TIME\" ]) self . capture_last_snapshot ( sub_round_id = self . subtask_amount ) self . subtask_amount += 1 self . agent . blackboard . add_requests ( { \"request_ {i} \" . format ( i = self . id ), self . request } ) if self . application_window is not None : self . capture_last_snapshot () if self . _should_evaluate : self . evaluation ()","title":"Round"},{"location":"modules/round/#round","text":"A Round is a single interaction between the user and UFO that processes a single user request. A Round is responsible for orchestrating the HostAgent and AppAgent to fulfill the user's request.","title":"Round"},{"location":"modules/round/#round-lifecycle","text":"In a Round , the following steps are executed:","title":"Round Lifecycle"},{"location":"modules/round/#1-round-initialization","text":"At the beginning of a Round , the Round object is created, and the user's request is processed by the HostAgent to determine the appropriate application to fulfill the request.","title":"1. Round Initialization"},{"location":"modules/round/#2-action-execution","text":"Once created, the Round orchestrates the HostAgent and AppAgent to execute the necessary actions to fulfill the user's request. The core logic of a Round is shown below: def run(self) -> None: \"\"\" Run the round. \"\"\" while not self.is_finished(): self.agent.handle(self.context) self.state = self.agent.state.next_state(self.agent) self.agent = self.agent.state.next_agent(self.agent) self.agent.set_state(self.state) # If the subtask ends, capture the last snapshot of the application. if self.state.is_subtask_end(): time.sleep(configs[\"SLEEP_TIME\"]) self.capture_last_snapshot(sub_round_id=self.subtask_amount) self.subtask_amount += 1 self.agent.blackboard.add_requests( {\"request_{i}\".format(i=self.id), self.request} ) if self.application_window is not None: self.capture_last_snapshot() if self._should_evaluate: self.evaluation() At each step, the Round processes the user's request by invoking the handle method of the AppAgent or HostAgent based on the current state. The state determines the next agent to handle the request and the next state to transition to.","title":"2. Action Execution"},{"location":"modules/round/#3-request-completion","text":"The AppAgent completes the actions within the application. If the request spans multiple applications, the HostAgent may switch to a different application to continue the task.","title":"3. Request Completion"},{"location":"modules/round/#4-round-termination","text":"Once the user's request is fulfilled, the Round is terminated, and the results are returned to the user. If configured, the EvaluationAgent evaluates the completeness of the Round .","title":"4. Round Termination"},{"location":"modules/round/#reference","text":"Bases: ABC A round of a session in UFO. A round manages a single user request and consists of multiple steps. A session may consists of multiple rounds of interactions. Initialize a round. Parameters: request ( str ) \u2013 The request of the round. agent ( BasicAgent ) \u2013 The initial agent of the round. context ( Context ) \u2013 The shared context of the round. should_evaluate ( bool ) \u2013 Whether to evaluate the round. id ( int ) \u2013 The id of the round. Source code in module/basic.py 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 def __init__ ( self , request : str , agent : BasicAgent , context : Context , should_evaluate : bool , id : int , ) -> None : \"\"\" Initialize a round. :param request: The request of the round. :param agent: The initial agent of the round. :param context: The shared context of the round. :param should_evaluate: Whether to evaluate the round. :param id: The id of the round. \"\"\" self . _request = request self . _context = context self . _agent = agent self . _state = agent . state self . _id = id self . _should_evaluate = should_evaluate self . _init_context ()","title":"Reference"},{"location":"modules/round/#module.basic.BaseRound.agent","text":"Get the agent of the round. return: The agent of the round.","title":"agent"},{"location":"modules/round/#module.basic.BaseRound.application_window","text":"Get the application of the session. return: The application of the session.","title":"application_window"},{"location":"modules/round/#module.basic.BaseRound.context","text":"Get the context of the round. return: The context of the round.","title":"context"},{"location":"modules/round/#module.basic.BaseRound.cost","text":"Get the cost of the round. return: The cost of the round.","title":"cost"},{"location":"modules/round/#module.basic.BaseRound.id","text":"Get the id of the round. return: The id of the round.","title":"id"},{"location":"modules/round/#module.basic.BaseRound.log_path","text":"Get the log path of the round. return: The log path of the round.","title":"log_path"},{"location":"modules/round/#module.basic.BaseRound.request","text":"Get the request of the round. return: The request of the round.","title":"request"},{"location":"modules/round/#module.basic.BaseRound.state","text":"Get the status of the round. return: The status of the round.","title":"state"},{"location":"modules/round/#module.basic.BaseRound.step","text":"Get the local step of the round. return: The step of the round.","title":"step"},{"location":"modules/round/#module.basic.BaseRound.subtask_amount","text":"Get the subtask amount of the round. return: The subtask amount of the round.","title":"subtask_amount"},{"location":"modules/round/#module.basic.BaseRound.capture_last_snapshot","text":"Capture the last snapshot of the application, including the screenshot and the XML file if configured. Parameters: sub_round_id ( Optional [ int ] , default: None ) \u2013 The id of the sub-round, default is None. Source code in module/basic.py 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 def capture_last_snapshot ( self , sub_round_id : Optional [ int ] = None ) -> None : \"\"\" Capture the last snapshot of the application, including the screenshot and the XML file if configured. :param sub_round_id: The id of the sub-round, default is None. \"\"\" # Capture the final screenshot if sub_round_id is None : screenshot_save_path = self . log_path + f \"action_round_ { self . id } _final.png\" else : screenshot_save_path = ( self . log_path + f \"action_round_ { self . id } _sub_round_ { sub_round_id } _final.png\" ) if self . application_window is not None : try : PhotographerFacade () . capture_app_window_screenshot ( self . application_window , save_path = screenshot_save_path ) except Exception as e : utils . print_with_color ( f \"Warning: The last snapshot capture failed, due to the error: { e } \" , \"yellow\" , ) if configs . get ( \"SAVE_UI_TREE\" , False ): step_ui_tree = ui_tree . UITree ( self . application_window ) ui_tree_path = os . path . join ( self . log_path , \"ui_trees\" ) ui_tree_file_name = ( f \"ui_tree_round_ { self . id } _final.json\" if sub_round_id is None else f \"ui_tree_round_ { self . id } _sub_round_ { sub_round_id } _final.json\" ) step_ui_tree . save_ui_tree_to_json ( os . path . join ( ui_tree_path , ui_tree_file_name , ) ) # Save the final XML file if configs [ \"LOG_XML\" ]: log_abs_path = os . path . abspath ( self . log_path ) xml_save_path = os . path . join ( log_abs_path , ( f \"xml/action_round_ { self . id } _final.xml\" if sub_round_id is None else f \"xml/action_round_ { self . id } _sub_round_ { sub_round_id } _final.xml\" ), ) if issubclass ( type ( self . agent ), HostAgent ): app_agent : AppAgent = self . agent . get_active_appagent () app_agent . Puppeteer . save_to_xml ( xml_save_path ) elif issubclass ( type ( self . agent ), AppAgent ): app_agent : AppAgent = self . agent app_agent . Puppeteer . save_to_xml ( xml_save_path )","title":"capture_last_snapshot"},{"location":"modules/round/#module.basic.BaseRound.evaluation","text":"TODO: Evaluate the round. Source code in module/basic.py 312 313 314 315 316 def evaluation ( self ) -> None : \"\"\" TODO: Evaluate the round. \"\"\" pass","title":"evaluation"},{"location":"modules/round/#module.basic.BaseRound.is_finished","text":"Check if the round is finished. return: True if the round is finished, otherwise False. Source code in module/basic.py 127 128 129 130 131 132 133 134 135 def is_finished ( self ) -> bool : \"\"\" Check if the round is finished. return: True if the round is finished, otherwise False. \"\"\" return ( self . state . is_round_end () or self . context . get ( ContextNames . SESSION_STEP ) >= configs [ \"MAX_STEP\" ] )","title":"is_finished"},{"location":"modules/round/#module.basic.BaseRound.print_cost","text":"Print the total cost of the round. Source code in module/basic.py 225 226 227 228 229 230 231 232 233 234 235 def print_cost ( self ) -> None : \"\"\" Print the total cost of the round. \"\"\" total_cost = self . cost if isinstance ( total_cost , float ): formatted_cost = \"$ {:.2f} \" . format ( total_cost ) utils . print_with_color ( f \"Request total cost for current round is { formatted_cost } \" , \"yellow\" )","title":"print_cost"},{"location":"modules/round/#module.basic.BaseRound.run","text":"Run the round. Source code in module/basic.py 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 def run ( self ) -> None : \"\"\" Run the round. \"\"\" while not self . is_finished (): self . agent . handle ( self . context ) self . state = self . agent . state . next_state ( self . agent ) self . agent = self . agent . state . next_agent ( self . agent ) self . agent . set_state ( self . state ) # If the subtask ends, capture the last snapshot of the application. if self . state . is_subtask_end (): time . sleep ( configs [ \"SLEEP_TIME\" ]) self . capture_last_snapshot ( sub_round_id = self . subtask_amount ) self . subtask_amount += 1 self . agent . blackboard . add_requests ( { \"request_ {i} \" . format ( i = self . id ), self . request } ) if self . application_window is not None : self . capture_last_snapshot () if self . _should_evaluate : self . evaluation ()","title":"run"},{"location":"modules/session/","text":"Session A Session is a conversation instance between the user and UFO. It is a continuous interaction that starts when the user initiates a request and ends when the request is completed. UFO supports multiple requests within the same session. Each request is processed sequentially, by a Round of interaction, until the user's request is fulfilled. We show the relationship between Session and Round in the following figure: Session Lifecycle The lifecycle of a Session is as follows: 1. Session Initialization A Session is initialized when the user starts a conversation with UFO. The Session object is created, and the first Round of interaction is initiated. At this stage, the user's request is processed by the HostAgent to determine the appropriate application to fulfill the request. The Context object is created to store the state of the conversation shared across all Rounds within the Session . 2. Session Processing Once the Session is initialized, the Round of interaction begins, which completes a single user request by orchestrating the HostAgent and AppAgent . 3. Next Round After the completion of the first Round , the Session requests the next request from the user to start the next Round of interaction. This process continues until there are no more requests from the user. The core logic of a Session is shown below: def run(self) -> None: \"\"\" Run the session. \"\"\" while not self.is_finished(): round = self.create_new_round() if round is None: break round.run() if self.application_window is not None: self.capture_last_snapshot() if self._should_evaluate and not self.is_error(): self.evaluation() self.print_cost() 4. Session Termination If the user has no more requests or decides to end the conversation, the Session is terminated, and the conversation ends. The EvaluationAgent evaluates the completeness of the Session if it is configured to do so. Reference Bases: ABC A basic session in UFO. A session consists of multiple rounds of interactions and conversations. Initialize a session. Parameters: task ( str ) \u2013 The name of current task. should_evaluate ( bool ) \u2013 Whether to evaluate the session. id ( int ) \u2013 The id of the session. Source code in module/basic.py 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 def __init__ ( self , task : str , should_evaluate : bool , id : int ) -> None : \"\"\" Initialize a session. :param task: The name of current task. :param should_evaluate: Whether to evaluate the session. :param id: The id of the session. \"\"\" self . _should_evaluate = should_evaluate self . _id = id # Logging-related properties self . log_path = f \"logs/ { task } /\" utils . create_folder ( self . log_path ) self . _rounds : Dict [ int , BaseRound ] = {} self . _context = Context () self . _init_context () self . _finish = False self . _host_agent : HostAgent = AgentFactory . create_agent ( \"host\" , \"HostAgent\" , configs [ \"HOST_AGENT\" ][ \"VISUAL_MODE\" ], configs [ \"HOSTAGENT_PROMPT\" ], configs [ \"HOSTAGENT_EXAMPLE_PROMPT\" ], configs [ \"API_PROMPT\" ], ) application_window : UIAWrapper property writable Get the application of the session. return: The application of the session. context : Context property Get the context of the session. return: The context of the session. cost : float property writable Get the cost of the session. return: The cost of the session. current_round : BaseRound property Get the current round of the session. return: The current round of the session. evaluation_logger : logging . Logger property Get the logger for evaluation. return: The logger for evaluation. id : int property Get the id of the session. return: The id of the session. rounds : Dict [ int , BaseRound ] property Get the rounds of the session. return: The rounds of the session. session_type : str property Get the class name of the session. return: The class name of the session. step : int property Get the step of the session. return: The step of the session. total_rounds : int property Get the total number of rounds in the session. return: The total number of rounds in the session. add_round ( id , round ) Add a round to the session. Parameters: id ( int ) \u2013 The id of the round. round ( BaseRound ) \u2013 The round to be added. Source code in module/basic.py 412 413 414 415 416 417 418 def add_round ( self , id : int , round : BaseRound ) -> None : \"\"\" Add a round to the session. :param id: The id of the round. :param round: The round to be added. \"\"\" self . _rounds [ id ] = round capture_last_snapshot () Capture the last snapshot of the application, including the screenshot and the XML file if configured. Source code in module/basic.py 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 def capture_last_snapshot ( self ) -> None : \"\"\" Capture the last snapshot of the application, including the screenshot and the XML file if configured. \"\"\" # Capture the final screenshot screenshot_save_path = self . log_path + f \"action_step_final.png\" if self . application_window is not None : try : PhotographerFacade () . capture_app_window_screenshot ( self . application_window , save_path = screenshot_save_path ) except Exception as e : utils . print_with_color ( f \"Warning: The last snapshot capture failed, due to the error: { e } \" , \"yellow\" , ) if configs . get ( \"SAVE_UI_TREE\" , False ): step_ui_tree = ui_tree . UITree ( self . application_window ) ui_tree_path = os . path . join ( self . log_path , \"ui_trees\" ) ui_tree_file_name = \"ui_tree_final.json\" step_ui_tree . save_ui_tree_to_json ( os . path . join ( ui_tree_path , ui_tree_file_name , ) ) # Save the final XML file if configs [ \"LOG_XML\" ]: log_abs_path = os . path . abspath ( self . log_path ) xml_save_path = os . path . join ( log_abs_path , f \"xml/action_step_final.xml\" ) app_agent = self . _host_agent . get_active_appagent () if app_agent is not None : app_agent . Puppeteer . save_to_xml ( xml_save_path ) create_following_round () Create a following round. return: The following round. Source code in module/basic.py 405 406 407 408 409 410 def create_following_round ( self ) -> BaseRound : \"\"\" Create a following round. return: The following round. \"\"\" pass create_new_round () abstractmethod Create a new round. Source code in module/basic.py 390 391 392 393 394 395 @abstractmethod def create_new_round ( self ) -> Optional [ BaseRound ]: \"\"\" Create a new round. \"\"\" pass evaluation () Evaluate the session. Source code in module/basic.py 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 def evaluation ( self ) -> None : \"\"\" Evaluate the session. \"\"\" utils . print_with_color ( \"Evaluating the session...\" , \"yellow\" ) evaluator = EvaluationAgent ( name = \"eva_agent\" , app_root_name = self . context . get ( ContextNames . APPLICATION_ROOT_NAME ), is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"EVALUATION_PROMPT\" ], example_prompt = \"\" , api_prompt = configs [ \"API_PROMPT\" ], ) requests = self . request_to_evaluate () # Evaluate the session, first use the default setting, if failed, then disable the screenshot evaluation. try : result , cost = evaluator . evaluate ( request = requests , log_path = self . log_path , eva_all_screenshots = configs . get ( \"EVA_ALL_SCREENSHOTS\" , True ), ) except Exception as e : result , cost = evaluator . evaluate ( request = requests , log_path = self . log_path , eva_all_screenshots = False , ) # Add additional information to the evaluation result. additional_info = { \"level\" : \"session\" , \"request\" : requests , \"id\" : 0 } result . update ( additional_info ) self . cost += cost evaluator . print_response ( result ) self . evaluation_logger . info ( json . dumps ( result )) experience_saver () Save the current trajectory as agent experience. Source code in module/basic.py 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 def experience_saver ( self ) -> None : \"\"\" Save the current trajectory as agent experience. \"\"\" utils . print_with_color ( \"Summarizing and saving the execution flow as experience...\" , \"yellow\" ) summarizer = ExperienceSummarizer ( configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], configs [ \"EXPERIENCE_PROMPT\" ], configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], configs [ \"API_PROMPT\" ], ) experience = summarizer . read_logs ( self . log_path ) summaries , cost = summarizer . get_summary_list ( experience ) experience_path = configs [ \"EXPERIENCE_SAVED_PATH\" ] utils . create_folder ( experience_path ) summarizer . create_or_update_yaml ( summaries , os . path . join ( experience_path , \"experience.yaml\" ) ) summarizer . create_or_update_vector_db ( summaries , os . path . join ( experience_path , \"experience_db\" ) ) self . cost += cost utils . print_with_color ( \"The experience has been saved.\" , \"magenta\" ) initialize_logger ( log_path , log_filename , mode = 'a' , configs = configs ) staticmethod Initialize logging. log_path: The path of the log file. log_filename: The name of the log file. return: The logger. Source code in module/basic.py 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 @staticmethod def initialize_logger ( log_path : str , log_filename : str , mode = 'a' , configs = configs ) -> logging . Logger : \"\"\" Initialize logging. log_path: The path of the log file. log_filename: The name of the log file. return: The logger. \"\"\" # Code for initializing logging logger = logging . Logger ( log_filename ) if not configs [ \"PRINT_LOG\" ]: # Remove existing handlers if PRINT_LOG is False logger . handlers = [] log_file_path = os . path . join ( log_path , log_filename ) file_handler = logging . FileHandler ( log_file_path , mode = mode , encoding = \"utf-8\" ) formatter = logging . Formatter ( \" %(message)s \" ) file_handler . setFormatter ( formatter ) logger . addHandler ( file_handler ) logger . setLevel ( configs [ \"LOG_LEVEL\" ]) return logger is_error () Check if the session is in error state. return: True if the session is in error state, otherwise False. Source code in module/basic.py 582 583 584 585 586 587 588 589 def is_error ( self ): \"\"\" Check if the session is in error state. return: True if the session is in error state, otherwise False. \"\"\" if self . current_round is not None : return self . current_round . state . name () == AgentStatus . ERROR . value return False is_finished () Check if the session is ended. return: True if the session is ended, otherwise False. Source code in module/basic.py 591 592 593 594 595 596 597 598 599 600 601 602 def is_finished ( self ) -> bool : \"\"\" Check if the session is ended. return: True if the session is ended, otherwise False. \"\"\" if self . _finish or self . step >= configs [ \"MAX_STEP\" ]: return True if self . is_error (): return True return False next_request () abstractmethod Get the next request of the session. return: The request of the session. Source code in module/basic.py 397 398 399 400 401 402 403 @abstractmethod def next_request ( self ) -> str : \"\"\" Get the next request of the session. return: The request of the session. \"\"\" pass print_cost () Print the total cost of the session. Source code in module/basic.py 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 def print_cost ( self ) -> None : \"\"\" Print the total cost of the session. \"\"\" if isinstance ( self . cost , float ) and self . cost > 0 : formatted_cost = \"$ {:.2f} \" . format ( self . cost ) utils . print_with_color ( f \"Total request cost of the session: { formatted_cost } $\" , \"yellow\" ) else : utils . print_with_color ( \"Cost is not available for the model {host_model} or {app_model} .\" . format ( host_model = configs [ \"HOST_AGENT\" ][ \"API_MODEL\" ], app_model = configs [ \"APP_AGENT\" ][ \"API_MODEL\" ], ), \"yellow\" , ) request_to_evaluate () abstractmethod Get the request to evaluate. return: The request(s) to evaluate. Source code in module/basic.py 604 605 606 607 608 609 610 @abstractmethod def request_to_evaluate ( self ) -> str : \"\"\" Get the request to evaluate. return: The request(s) to evaluate. \"\"\" pass run () Run the session. Source code in module/basic.py 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 def run ( self ) -> None : \"\"\" Run the session. \"\"\" while not self . is_finished (): round = self . create_new_round () if round is None : break round . run () if self . application_window is not None : self . capture_last_snapshot () if self . _should_evaluate and not self . is_error (): self . evaluation () self . print_cost ()","title":"Session"},{"location":"modules/session/#session","text":"A Session is a conversation instance between the user and UFO. It is a continuous interaction that starts when the user initiates a request and ends when the request is completed. UFO supports multiple requests within the same session. Each request is processed sequentially, by a Round of interaction, until the user's request is fulfilled. We show the relationship between Session and Round in the following figure:","title":"Session"},{"location":"modules/session/#session-lifecycle","text":"The lifecycle of a Session is as follows:","title":"Session Lifecycle"},{"location":"modules/session/#1-session-initialization","text":"A Session is initialized when the user starts a conversation with UFO. The Session object is created, and the first Round of interaction is initiated. At this stage, the user's request is processed by the HostAgent to determine the appropriate application to fulfill the request. The Context object is created to store the state of the conversation shared across all Rounds within the Session .","title":"1. Session Initialization"},{"location":"modules/session/#2-session-processing","text":"Once the Session is initialized, the Round of interaction begins, which completes a single user request by orchestrating the HostAgent and AppAgent .","title":"2. Session Processing"},{"location":"modules/session/#3-next-round","text":"After the completion of the first Round , the Session requests the next request from the user to start the next Round of interaction. This process continues until there are no more requests from the user. The core logic of a Session is shown below: def run(self) -> None: \"\"\" Run the session. \"\"\" while not self.is_finished(): round = self.create_new_round() if round is None: break round.run() if self.application_window is not None: self.capture_last_snapshot() if self._should_evaluate and not self.is_error(): self.evaluation() self.print_cost()","title":"3. Next Round"},{"location":"modules/session/#4-session-termination","text":"If the user has no more requests or decides to end the conversation, the Session is terminated, and the conversation ends. The EvaluationAgent evaluates the completeness of the Session if it is configured to do so.","title":"4. Session Termination"},{"location":"modules/session/#reference","text":"Bases: ABC A basic session in UFO. A session consists of multiple rounds of interactions and conversations. Initialize a session. Parameters: task ( str ) \u2013 The name of current task. should_evaluate ( bool ) \u2013 Whether to evaluate the session. id ( int ) \u2013 The id of the session. Source code in module/basic.py 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 def __init__ ( self , task : str , should_evaluate : bool , id : int ) -> None : \"\"\" Initialize a session. :param task: The name of current task. :param should_evaluate: Whether to evaluate the session. :param id: The id of the session. \"\"\" self . _should_evaluate = should_evaluate self . _id = id # Logging-related properties self . log_path = f \"logs/ { task } /\" utils . create_folder ( self . log_path ) self . _rounds : Dict [ int , BaseRound ] = {} self . _context = Context () self . _init_context () self . _finish = False self . _host_agent : HostAgent = AgentFactory . create_agent ( \"host\" , \"HostAgent\" , configs [ \"HOST_AGENT\" ][ \"VISUAL_MODE\" ], configs [ \"HOSTAGENT_PROMPT\" ], configs [ \"HOSTAGENT_EXAMPLE_PROMPT\" ], configs [ \"API_PROMPT\" ], )","title":"Reference"},{"location":"modules/session/#module.basic.BaseSession.application_window","text":"Get the application of the session. return: The application of the session.","title":"application_window"},{"location":"modules/session/#module.basic.BaseSession.context","text":"Get the context of the session. return: The context of the session.","title":"context"},{"location":"modules/session/#module.basic.BaseSession.cost","text":"Get the cost of the session. return: The cost of the session.","title":"cost"},{"location":"modules/session/#module.basic.BaseSession.current_round","text":"Get the current round of the session. return: The current round of the session.","title":"current_round"},{"location":"modules/session/#module.basic.BaseSession.evaluation_logger","text":"Get the logger for evaluation. return: The logger for evaluation.","title":"evaluation_logger"},{"location":"modules/session/#module.basic.BaseSession.id","text":"Get the id of the session. return: The id of the session.","title":"id"},{"location":"modules/session/#module.basic.BaseSession.rounds","text":"Get the rounds of the session. return: The rounds of the session.","title":"rounds"},{"location":"modules/session/#module.basic.BaseSession.session_type","text":"Get the class name of the session. return: The class name of the session.","title":"session_type"},{"location":"modules/session/#module.basic.BaseSession.step","text":"Get the step of the session. return: The step of the session.","title":"step"},{"location":"modules/session/#module.basic.BaseSession.total_rounds","text":"Get the total number of rounds in the session. return: The total number of rounds in the session.","title":"total_rounds"},{"location":"modules/session/#module.basic.BaseSession.add_round","text":"Add a round to the session. Parameters: id ( int ) \u2013 The id of the round. round ( BaseRound ) \u2013 The round to be added. Source code in module/basic.py 412 413 414 415 416 417 418 def add_round ( self , id : int , round : BaseRound ) -> None : \"\"\" Add a round to the session. :param id: The id of the round. :param round: The round to be added. \"\"\" self . _rounds [ id ] = round","title":"add_round"},{"location":"modules/session/#module.basic.BaseSession.capture_last_snapshot","text":"Capture the last snapshot of the application, including the screenshot and the XML file if configured. Source code in module/basic.py 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 def capture_last_snapshot ( self ) -> None : \"\"\" Capture the last snapshot of the application, including the screenshot and the XML file if configured. \"\"\" # Capture the final screenshot screenshot_save_path = self . log_path + f \"action_step_final.png\" if self . application_window is not None : try : PhotographerFacade () . capture_app_window_screenshot ( self . application_window , save_path = screenshot_save_path ) except Exception as e : utils . print_with_color ( f \"Warning: The last snapshot capture failed, due to the error: { e } \" , \"yellow\" , ) if configs . get ( \"SAVE_UI_TREE\" , False ): step_ui_tree = ui_tree . UITree ( self . application_window ) ui_tree_path = os . path . join ( self . log_path , \"ui_trees\" ) ui_tree_file_name = \"ui_tree_final.json\" step_ui_tree . save_ui_tree_to_json ( os . path . join ( ui_tree_path , ui_tree_file_name , ) ) # Save the final XML file if configs [ \"LOG_XML\" ]: log_abs_path = os . path . abspath ( self . log_path ) xml_save_path = os . path . join ( log_abs_path , f \"xml/action_step_final.xml\" ) app_agent = self . _host_agent . get_active_appagent () if app_agent is not None : app_agent . Puppeteer . save_to_xml ( xml_save_path )","title":"capture_last_snapshot"},{"location":"modules/session/#module.basic.BaseSession.create_following_round","text":"Create a following round. return: The following round. Source code in module/basic.py 405 406 407 408 409 410 def create_following_round ( self ) -> BaseRound : \"\"\" Create a following round. return: The following round. \"\"\" pass","title":"create_following_round"},{"location":"modules/session/#module.basic.BaseSession.create_new_round","text":"Create a new round. Source code in module/basic.py 390 391 392 393 394 395 @abstractmethod def create_new_round ( self ) -> Optional [ BaseRound ]: \"\"\" Create a new round. \"\"\" pass","title":"create_new_round"},{"location":"modules/session/#module.basic.BaseSession.evaluation","text":"Evaluate the session. Source code in module/basic.py 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 def evaluation ( self ) -> None : \"\"\" Evaluate the session. \"\"\" utils . print_with_color ( \"Evaluating the session...\" , \"yellow\" ) evaluator = EvaluationAgent ( name = \"eva_agent\" , app_root_name = self . context . get ( ContextNames . APPLICATION_ROOT_NAME ), is_visual = configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], main_prompt = configs [ \"EVALUATION_PROMPT\" ], example_prompt = \"\" , api_prompt = configs [ \"API_PROMPT\" ], ) requests = self . request_to_evaluate () # Evaluate the session, first use the default setting, if failed, then disable the screenshot evaluation. try : result , cost = evaluator . evaluate ( request = requests , log_path = self . log_path , eva_all_screenshots = configs . get ( \"EVA_ALL_SCREENSHOTS\" , True ), ) except Exception as e : result , cost = evaluator . evaluate ( request = requests , log_path = self . log_path , eva_all_screenshots = False , ) # Add additional information to the evaluation result. additional_info = { \"level\" : \"session\" , \"request\" : requests , \"id\" : 0 } result . update ( additional_info ) self . cost += cost evaluator . print_response ( result ) self . evaluation_logger . info ( json . dumps ( result ))","title":"evaluation"},{"location":"modules/session/#module.basic.BaseSession.experience_saver","text":"Save the current trajectory as agent experience. Source code in module/basic.py 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 def experience_saver ( self ) -> None : \"\"\" Save the current trajectory as agent experience. \"\"\" utils . print_with_color ( \"Summarizing and saving the execution flow as experience...\" , \"yellow\" ) summarizer = ExperienceSummarizer ( configs [ \"APP_AGENT\" ][ \"VISUAL_MODE\" ], configs [ \"EXPERIENCE_PROMPT\" ], configs [ \"APPAGENT_EXAMPLE_PROMPT\" ], configs [ \"API_PROMPT\" ], ) experience = summarizer . read_logs ( self . log_path ) summaries , cost = summarizer . get_summary_list ( experience ) experience_path = configs [ \"EXPERIENCE_SAVED_PATH\" ] utils . create_folder ( experience_path ) summarizer . create_or_update_yaml ( summaries , os . path . join ( experience_path , \"experience.yaml\" ) ) summarizer . create_or_update_vector_db ( summaries , os . path . join ( experience_path , \"experience_db\" ) ) self . cost += cost utils . print_with_color ( \"The experience has been saved.\" , \"magenta\" )","title":"experience_saver"},{"location":"modules/session/#module.basic.BaseSession.initialize_logger","text":"Initialize logging. log_path: The path of the log file. log_filename: The name of the log file. return: The logger. Source code in module/basic.py 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 @staticmethod def initialize_logger ( log_path : str , log_filename : str , mode = 'a' , configs = configs ) -> logging . Logger : \"\"\" Initialize logging. log_path: The path of the log file. log_filename: The name of the log file. return: The logger. \"\"\" # Code for initializing logging logger = logging . Logger ( log_filename ) if not configs [ \"PRINT_LOG\" ]: # Remove existing handlers if PRINT_LOG is False logger . handlers = [] log_file_path = os . path . join ( log_path , log_filename ) file_handler = logging . FileHandler ( log_file_path , mode = mode , encoding = \"utf-8\" ) formatter = logging . Formatter ( \" %(message)s \" ) file_handler . setFormatter ( formatter ) logger . addHandler ( file_handler ) logger . setLevel ( configs [ \"LOG_LEVEL\" ]) return logger","title":"initialize_logger"},{"location":"modules/session/#module.basic.BaseSession.is_error","text":"Check if the session is in error state. return: True if the session is in error state, otherwise False. Source code in module/basic.py 582 583 584 585 586 587 588 589 def is_error ( self ): \"\"\" Check if the session is in error state. return: True if the session is in error state, otherwise False. \"\"\" if self . current_round is not None : return self . current_round . state . name () == AgentStatus . ERROR . value return False","title":"is_error"},{"location":"modules/session/#module.basic.BaseSession.is_finished","text":"Check if the session is ended. return: True if the session is ended, otherwise False. Source code in module/basic.py 591 592 593 594 595 596 597 598 599 600 601 602 def is_finished ( self ) -> bool : \"\"\" Check if the session is ended. return: True if the session is ended, otherwise False. \"\"\" if self . _finish or self . step >= configs [ \"MAX_STEP\" ]: return True if self . is_error (): return True return False","title":"is_finished"},{"location":"modules/session/#module.basic.BaseSession.next_request","text":"Get the next request of the session. return: The request of the session. Source code in module/basic.py 397 398 399 400 401 402 403 @abstractmethod def next_request ( self ) -> str : \"\"\" Get the next request of the session. return: The request of the session. \"\"\" pass","title":"next_request"},{"location":"modules/session/#module.basic.BaseSession.print_cost","text":"Print the total cost of the session. Source code in module/basic.py 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 def print_cost ( self ) -> None : \"\"\" Print the total cost of the session. \"\"\" if isinstance ( self . cost , float ) and self . cost > 0 : formatted_cost = \"$ {:.2f} \" . format ( self . cost ) utils . print_with_color ( f \"Total request cost of the session: { formatted_cost } $\" , \"yellow\" ) else : utils . print_with_color ( \"Cost is not available for the model {host_model} or {app_model} .\" . format ( host_model = configs [ \"HOST_AGENT\" ][ \"API_MODEL\" ], app_model = configs [ \"APP_AGENT\" ][ \"API_MODEL\" ], ), \"yellow\" , )","title":"print_cost"},{"location":"modules/session/#module.basic.BaseSession.request_to_evaluate","text":"Get the request to evaluate. return: The request(s) to evaluate. Source code in module/basic.py 604 605 606 607 608 609 610 @abstractmethod def request_to_evaluate ( self ) -> str : \"\"\" Get the request to evaluate. return: The request(s) to evaluate. \"\"\" pass","title":"request_to_evaluate"},{"location":"modules/session/#module.basic.BaseSession.run","text":"Run the session. Source code in module/basic.py 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 def run ( self ) -> None : \"\"\" Run the session. \"\"\" while not self . is_finished (): round = self . create_new_round () if round is None : break round . run () if self . application_window is not None : self . capture_last_snapshot () if self . _should_evaluate and not self . is_error (): self . evaluation () self . print_cost ()","title":"run"},{"location":"prompts/api_prompts/","text":"API Prompts The API prompts provide the description and usage of the APIs used in UFO. Shared APIs and app-specific APIs are stored in different directories: Directory Description ufo/prompts/share/base/api.yaml Shared APIs used by multiple applications ufo/prompts/{app_name} APIs specific to an application Info You can configure the API prompt used in the config.yaml file. You can find more information about the configuration file here . Tip You may customize the API prompt for a specific application by adding the API prompt in the application's directory. Example API Prompt Below is an example of an API prompt: click_input: summary: |- \"click_input\" is to click the control item with mouse. class_name: |- ClickInputCommand usage: |- [1] API call: click_input(button: str, double: bool) [2] Args: - button: 'The mouse button to click. One of ''left'', ''right'', ''middle'' or ''x'' (Default: ''left'')' - double: 'Whether to perform a double click or not (Default: False)' [3] Example: click_input(button=\"left\", double=False) [4] Available control item: All control items. [5] Return: None To create a new API prompt, follow the template above and add it to the appropriate directory.","title":"API Prompts"},{"location":"prompts/api_prompts/#api-prompts","text":"The API prompts provide the description and usage of the APIs used in UFO. Shared APIs and app-specific APIs are stored in different directories: Directory Description ufo/prompts/share/base/api.yaml Shared APIs used by multiple applications ufo/prompts/{app_name} APIs specific to an application Info You can configure the API prompt used in the config.yaml file. You can find more information about the configuration file here . Tip You may customize the API prompt for a specific application by adding the API prompt in the application's directory.","title":"API Prompts"},{"location":"prompts/api_prompts/#example-api-prompt","text":"Below is an example of an API prompt: click_input: summary: |- \"click_input\" is to click the control item with mouse. class_name: |- ClickInputCommand usage: |- [1] API call: click_input(button: str, double: bool) [2] Args: - button: 'The mouse button to click. One of ''left'', ''right'', ''middle'' or ''x'' (Default: ''left'')' - double: 'Whether to perform a double click or not (Default: False)' [3] Example: click_input(button=\"left\", double=False) [4] Available control item: All control items. [5] Return: None To create a new API prompt, follow the template above and add it to the appropriate directory.","title":"Example API Prompt"},{"location":"prompts/basic_template/","text":"Basic Prompt Template The basic prompt template is a fixed format that is used to generate prompts for the HostAgent , AppAgent , FollowerAgent , and EvaluationAgent . It include the template for the system and user roles to construct the agent's prompt. Below is the default file path for the basic prompt template: Agent File Path Version HostAgent ufo/prompts/share/base/host_agent.yaml base HostAgent ufo/prompts/share/lite/host_agent.yaml lite AppAgent ufo/prompts/share/base/app_agent.yaml base AppAgent ufo/prompts/share/lite/app_agent.yaml lite FollowerAgent ufo/prompts/share/base/app_agent.yaml base FollowerAgent ufo/prompts/share/lite/app_agent.yaml lite EvaluationAgent ufo/prompts/evaluation/evaluation_agent.yaml - Info You can configure the prompt template used in the config.yaml file. You can find more information about the configuration file here .","title":"Basic Prompts"},{"location":"prompts/basic_template/#basic-prompt-template","text":"The basic prompt template is a fixed format that is used to generate prompts for the HostAgent , AppAgent , FollowerAgent , and EvaluationAgent . It include the template for the system and user roles to construct the agent's prompt. Below is the default file path for the basic prompt template: Agent File Path Version HostAgent ufo/prompts/share/base/host_agent.yaml base HostAgent ufo/prompts/share/lite/host_agent.yaml lite AppAgent ufo/prompts/share/base/app_agent.yaml base AppAgent ufo/prompts/share/lite/app_agent.yaml lite FollowerAgent ufo/prompts/share/base/app_agent.yaml base FollowerAgent ufo/prompts/share/lite/app_agent.yaml lite EvaluationAgent ufo/prompts/evaluation/evaluation_agent.yaml - Info You can configure the prompt template used in the config.yaml file. You can find more information about the configuration file here .","title":"Basic Prompt Template"},{"location":"prompts/examples_prompts/","text":"Example Prompts The example prompts are used to generate textual demonstration examples for in-context learning. The examples are stored in the ufo/prompts/examples directory, with the following subdirectories: Directory Description lite Lite version of demonstration examples non-visual Examples for non-visual LLMs visual Examples for visual LLMs Info You can configure the example prompt used in the config.yaml file. You can find more information about the configuration file here . Example Prompts Below are examples for the HostAgent and AppAgent : HostAgent : Request: |- Summarize and add all to do items on Microsoft To Do from the meeting notes email, and write a summary on the meeting_notes.docx. Response: Observation: |- The current screenshot shows the Microsoft To Do application is visible, and outlook application and the meeting_notes.docx are available in the list of applications. Thought: |- The user request can be decomposed into three sub-tasks: (1) Summarize all to do items on Microsoft To Do from the meeting_notes email, (2) Add all to do items to Microsoft To Do, and (3) Write a summary on the meeting_notes.docx. I need to open the Microsoft To Do application to complete the first two sub-tasks. Each sub-task will be completed in individual applications sequentially. CurrentSubtask: |- Summarized all to do items from the meeting notes email in Outlook. Message: - (1) You need to first search for the meeting notes email in Outlook to summarize. - (2) Only summarize the to do items from the meeting notes email, without any redundant information. ControlLabel: |- 16 ControlText: |- Mail - Outlook - Jim Status: |- CONTINUE Plan: - Add all to do items previously summarized from the meeting notes email to one-by-one Microsoft To Do. - Write a summary about the meeting notes email on the meeting_notes.docx. Comment: |- I plan to first summarize all to do items from the meeting notes email in Outlook. Questions: [] AppAgent : Request: |- How many stars does the Imdiffusion repo have? Sub-task: |- Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually. Response: Observation: |- I observe that the Edge browser is visible in the screenshot, with the Google search page opened. Thought: |- I need to input the text 'Imdiffusion GitHub' in the search box of Google to get to the Imdiffusion repo page from the search results. The search box is usually in a type of ComboBox. ControlLabel: |- 36 ControlText: |- \u641c\u7d22 Function: |- set_edit_text Args: {\"text\": \"Imdiffusion GitHub\"} Status: |- CONTINUE Plan: - (1) After input 'Imdiffusion GitHub', click Google Search to search for the Imdiffusion repo on github. - (2) Once the searched results are visible, click the Imdiffusion repo Hyperlink in the searched results to open the repo page. - (3) Observing and summarize the number of stars the Imdiffusion repo page, and reply to the user request. Comment: |- I plan to use Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually. SaveScreenshot: {\"save\": false, \"reason\": \"\"} Tips: |- - The search box is usually in a type of ComboBox. - The number of stars of a Github repo page can be found in the repo page visually. These examples regulate the output format of the agent's response and provide a structured way to generate demonstration examples for in-context learning.","title":"Examples Prompts"},{"location":"prompts/examples_prompts/#example-prompts","text":"The example prompts are used to generate textual demonstration examples for in-context learning. The examples are stored in the ufo/prompts/examples directory, with the following subdirectories: Directory Description lite Lite version of demonstration examples non-visual Examples for non-visual LLMs visual Examples for visual LLMs Info You can configure the example prompt used in the config.yaml file. You can find more information about the configuration file here .","title":"Example Prompts"},{"location":"prompts/examples_prompts/#example-prompts_1","text":"Below are examples for the HostAgent and AppAgent : HostAgent : Request: |- Summarize and add all to do items on Microsoft To Do from the meeting notes email, and write a summary on the meeting_notes.docx. Response: Observation: |- The current screenshot shows the Microsoft To Do application is visible, and outlook application and the meeting_notes.docx are available in the list of applications. Thought: |- The user request can be decomposed into three sub-tasks: (1) Summarize all to do items on Microsoft To Do from the meeting_notes email, (2) Add all to do items to Microsoft To Do, and (3) Write a summary on the meeting_notes.docx. I need to open the Microsoft To Do application to complete the first two sub-tasks. Each sub-task will be completed in individual applications sequentially. CurrentSubtask: |- Summarized all to do items from the meeting notes email in Outlook. Message: - (1) You need to first search for the meeting notes email in Outlook to summarize. - (2) Only summarize the to do items from the meeting notes email, without any redundant information. ControlLabel: |- 16 ControlText: |- Mail - Outlook - Jim Status: |- CONTINUE Plan: - Add all to do items previously summarized from the meeting notes email to one-by-one Microsoft To Do. - Write a summary about the meeting notes email on the meeting_notes.docx. Comment: |- I plan to first summarize all to do items from the meeting notes email in Outlook. Questions: [] AppAgent : Request: |- How many stars does the Imdiffusion repo have? Sub-task: |- Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually. Response: Observation: |- I observe that the Edge browser is visible in the screenshot, with the Google search page opened. Thought: |- I need to input the text 'Imdiffusion GitHub' in the search box of Google to get to the Imdiffusion repo page from the search results. The search box is usually in a type of ComboBox. ControlLabel: |- 36 ControlText: |- \u641c\u7d22 Function: |- set_edit_text Args: {\"text\": \"Imdiffusion GitHub\"} Status: |- CONTINUE Plan: - (1) After input 'Imdiffusion GitHub', click Google Search to search for the Imdiffusion repo on github. - (2) Once the searched results are visible, click the Imdiffusion repo Hyperlink in the searched results to open the repo page. - (3) Observing and summarize the number of stars the Imdiffusion repo page, and reply to the user request. Comment: |- I plan to use Google search for the Imdiffusion repo on github and summarize the number of stars the Imdiffusion repo page visually. SaveScreenshot: {\"save\": false, \"reason\": \"\"} Tips: |- - The search box is usually in a type of ComboBox. - The number of stars of a Github repo page can be found in the repo page visually. These examples regulate the output format of the agent's response and provide a structured way to generate demonstration examples for in-context learning.","title":"Example Prompts"},{"location":"prompts/overview/","text":"Prompts All prompts used in UFO are stored in the ufo/prompts directory. The folder structure is as follows: \ud83d\udce6prompts \u2523 \ud83d\udcc2apps # Stores API prompts for specific applications \u2523 \ud83d\udcc2excel # Stores API prompts for Excel \u2523 \ud83d\udcc2word # Stores API prompts for Word \u2517 ... \u2523 \ud83d\udcc2demonstration # Stores prompts for summarizing demonstrations from humans using Step Recorder \u2523 \ud83d\udcc2experience # Stores prompts for summarizing the agent's self-experience \u2523 \ud83d\udcc2evaluation # Stores prompts for the EvaluationAgent \u2523 \ud83d\udcc2examples # Stores demonstration examples for in-context learning \u2523 \ud83d\udcc2lite # Lite version of demonstration examples \u2523 \ud83d\udcc2non-visual # Examples for non-visual LLMs \u2517 \ud83d\udcc2visual # Examples for visual LLMs \u2517 \ud83d\udcc2share # Stores shared prompts \u2523 \ud83d\udcc2lite # Lite version of shared prompts \u2517 \ud83d\udcc2base # Basic version of shared prompts \u2523 \ud83d\udcdcapi.yaml # Basic API prompt \u2523 \ud83d\udcdcapp_agent.yaml # Basic AppAgent prompt template \u2517 \ud83d\udcdchost_agent.yaml # Basic HostAgent prompt template Note The lite version of prompts is a simplified version of the full prompts, which is used for LLMs that have a limited token budget. However, the lite version is not fully optimized and may lead to suboptimal performance. Note The non-visual and visual folders contain examples for non-visual and visual LLMs, respectively. Agent Prompts Prompts used an agent usually contain the following information: Prompt Description Basic template A basic template for the agent prompt. API A prompt for all skills and APIs used by the agent. Examples Demonstration examples for the agent for in-context learning. You can find these prompts share directory. The prompts for specific applications are stored in the apps directory. Tip All information is constructed using the agent's Prompter class. You can find more details about the Prompter class in the documentation here .","title":"Overview"},{"location":"prompts/overview/#prompts","text":"All prompts used in UFO are stored in the ufo/prompts directory. The folder structure is as follows: \ud83d\udce6prompts \u2523 \ud83d\udcc2apps # Stores API prompts for specific applications \u2523 \ud83d\udcc2excel # Stores API prompts for Excel \u2523 \ud83d\udcc2word # Stores API prompts for Word \u2517 ... \u2523 \ud83d\udcc2demonstration # Stores prompts for summarizing demonstrations from humans using Step Recorder \u2523 \ud83d\udcc2experience # Stores prompts for summarizing the agent's self-experience \u2523 \ud83d\udcc2evaluation # Stores prompts for the EvaluationAgent \u2523 \ud83d\udcc2examples # Stores demonstration examples for in-context learning \u2523 \ud83d\udcc2lite # Lite version of demonstration examples \u2523 \ud83d\udcc2non-visual # Examples for non-visual LLMs \u2517 \ud83d\udcc2visual # Examples for visual LLMs \u2517 \ud83d\udcc2share # Stores shared prompts \u2523 \ud83d\udcc2lite # Lite version of shared prompts \u2517 \ud83d\udcc2base # Basic version of shared prompts \u2523 \ud83d\udcdcapi.yaml # Basic API prompt \u2523 \ud83d\udcdcapp_agent.yaml # Basic AppAgent prompt template \u2517 \ud83d\udcdchost_agent.yaml # Basic HostAgent prompt template Note The lite version of prompts is a simplified version of the full prompts, which is used for LLMs that have a limited token budget. However, the lite version is not fully optimized and may lead to suboptimal performance. Note The non-visual and visual folders contain examples for non-visual and visual LLMs, respectively.","title":"Prompts"},{"location":"prompts/overview/#agent-prompts","text":"Prompts used an agent usually contain the following information: Prompt Description Basic template A basic template for the agent prompt. API A prompt for all skills and APIs used by the agent. Examples Demonstration examples for the agent for in-context learning. You can find these prompts share directory. The prompts for specific applications are stored in the apps directory. Tip All information is constructed using the agent's Prompter class. You can find more details about the Prompter class in the documentation here .","title":"Agent Prompts"},{"location":"supported_models/azure_openai/","text":"Azure OpenAI (AOAI) Step 1 To use the Azure OpenAI API, you need to create an account on the Azure OpenAI website . After creating an account, you can deploy the AOAI API and access the API key. Step 2 After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Azure OpenAI API. The following is an example configuration for the Azure OpenAI API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"aoai\" , # The API type, \"openai\" for the OpenAI API, \"aoai\" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API. API_BASE: \"YOUR_ENDPOINT\", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com API_KEY: \"YOUR_KEY\", # The aoai API key API_VERSION: \"2024-02-15-preview\", # The version of the API, \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model name, \"gpt-4-vision-preview\" by default. You may also use \"gpt-4o\" for using the GPT-4O model. API_DEPLOYMENT_ID: \"YOUR_AOAI_DEPLOYMENT\", # The deployment id for the AOAI API If you want to use AAD for authentication, you should also set the following configuration: AAD_TENANT_ID: \"YOUR_TENANT_ID\", # Set the value to your tenant id for the llm model AAD_API_SCOPE: \"YOUR_SCOPE\", # Set the value to your scope for the llm model AAD_API_SCOPE_BASE: \"YOUR_SCOPE_BASE\" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE, and the only need is the YOUR_SCOPE_BASE Tip If you set VISUAL_MODE to True , make sure the API_DEPLOYMENT_ID supports visual inputs. Step 3 After configuring the HOST_AGENT and APP_AGENT with the OpenAI API, you can start using UFO to interact with the AOAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Azure OpenAI"},{"location":"supported_models/azure_openai/#azure-openai-aoai","text":"","title":"Azure OpenAI (AOAI)"},{"location":"supported_models/azure_openai/#step-1","text":"To use the Azure OpenAI API, you need to create an account on the Azure OpenAI website . After creating an account, you can deploy the AOAI API and access the API key.","title":"Step 1"},{"location":"supported_models/azure_openai/#step-2","text":"After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Azure OpenAI API. The following is an example configuration for the Azure OpenAI API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"aoai\" , # The API type, \"openai\" for the OpenAI API, \"aoai\" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API. API_BASE: \"YOUR_ENDPOINT\", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com API_KEY: \"YOUR_KEY\", # The aoai API key API_VERSION: \"2024-02-15-preview\", # The version of the API, \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model name, \"gpt-4-vision-preview\" by default. You may also use \"gpt-4o\" for using the GPT-4O model. API_DEPLOYMENT_ID: \"YOUR_AOAI_DEPLOYMENT\", # The deployment id for the AOAI API If you want to use AAD for authentication, you should also set the following configuration: AAD_TENANT_ID: \"YOUR_TENANT_ID\", # Set the value to your tenant id for the llm model AAD_API_SCOPE: \"YOUR_SCOPE\", # Set the value to your scope for the llm model AAD_API_SCOPE_BASE: \"YOUR_SCOPE_BASE\" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE, and the only need is the YOUR_SCOPE_BASE Tip If you set VISUAL_MODE to True , make sure the API_DEPLOYMENT_ID supports visual inputs.","title":"Step 2"},{"location":"supported_models/azure_openai/#step-3","text":"After configuring the HOST_AGENT and APP_AGENT with the OpenAI API, you can start using UFO to interact with the AOAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Step 3"},{"location":"supported_models/claude/","text":"Anthropic Claude Step 1 To use the Claude API, you need to create an account on the Claude website and access the API key. Step 2 You may need to install additional dependencies to use the Claude API. You can install the dependencies using the following command: pip install -U anthropic==0.37.1 Step 3 Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Claude API. The following is an example configuration for the Claude API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"Claude\" , API_KEY: \"YOUR_KEY\", API_MODEL: \"YOUR_MODEL\" Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. Tip API_MODEL is the model name of Claude LLM API. You can find the model name in the Claude LLM model list. Step 4 After configuring the HOST_AGENT and APP_AGENT with the Claude API, you can start using UFO to interact with the Claude API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Claude"},{"location":"supported_models/claude/#anthropic-claude","text":"","title":"Anthropic Claude"},{"location":"supported_models/claude/#step-1","text":"To use the Claude API, you need to create an account on the Claude website and access the API key.","title":"Step 1"},{"location":"supported_models/claude/#step-2","text":"You may need to install additional dependencies to use the Claude API. You can install the dependencies using the following command: pip install -U anthropic==0.37.1","title":"Step 2"},{"location":"supported_models/claude/#step-3","text":"Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Claude API. The following is an example configuration for the Claude API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"Claude\" , API_KEY: \"YOUR_KEY\", API_MODEL: \"YOUR_MODEL\" Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. Tip API_MODEL is the model name of Claude LLM API. You can find the model name in the Claude LLM model list.","title":"Step 3"},{"location":"supported_models/claude/#step-4","text":"After configuring the HOST_AGENT and APP_AGENT with the Claude API, you can start using UFO to interact with the Claude API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Step 4"},{"location":"supported_models/custom_model/","text":"Customized LLM Models We support and welcome the integration of custom LLM models in UFO. If you have a custom LLM model that you would like to use with UFO, you can follow the steps below to configure the model in UFO. Step 1 Create a custom LLM model and serve it on your local environment. Step 2 Create a python script under the ufo/llm directory, and implement your own LLM model class by inheriting the BaseService class in the ufo/llm/base.py file. We leave a PlaceHolderService class in the ufo/llm/placeholder.py file as an example. You must implement the chat_completion method in your LLM model class to accept a list of messages and return a list of completions for each message. def chat_completion( self, messages, n, temperature: Optional[float] = None, max_tokens: Optional[int] = None, top_p: Optional[float] = None, **kwargs: Any, ): \"\"\" Generates completions for a given list of messages. Args: messages (List[str]): The list of messages to generate completions for. n (int): The number of completions to generate for each message. temperature (float, optional): Controls the randomness of the generated completions. Higher values (e.g., 0.8) make the completions more random, while lower values (e.g., 0.2) make the completions more focused and deterministic. If not provided, the default value from the model configuration will be used. max_tokens (int, optional): The maximum number of tokens in the generated completions. If not provided, the default value from the model configuration will be used. top_p (float, optional): Controls the diversity of the generated completions. Higher values (e.g., 0.8) make the completions more diverse, while lower values (e.g., 0.2) make the completions more focused. If not provided, the default value from the model configuration will be used. **kwargs: Additional keyword arguments to be passed to the underlying completion method. Returns: List[str], None:A list of generated completions for each message and the cost set to be None. Raises: Exception: If an error occurs while making the API request. \"\"\" pass Step 3 After implementing the LLM model class, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the custom LLM model. The following is an example configuration for the custom LLM model: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"custom_model\" , # The API type, \"openai\" for the OpenAI API, \"aoai\" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API. API_BASE: \"YOUR_ENDPOINT\", # The custom LLM API address. API_MODEL: \"YOUR_MODEL\", # The custom LLM model name. Step 4 After configuring the HOST_AGENT and APP_AGENT with the custom LLM model, you can start using UFO to interact with the custom LLM model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Custom Model"},{"location":"supported_models/custom_model/#customized-llm-models","text":"We support and welcome the integration of custom LLM models in UFO. If you have a custom LLM model that you would like to use with UFO, you can follow the steps below to configure the model in UFO.","title":"Customized LLM Models"},{"location":"supported_models/custom_model/#step-1","text":"Create a custom LLM model and serve it on your local environment.","title":"Step 1"},{"location":"supported_models/custom_model/#step-2","text":"Create a python script under the ufo/llm directory, and implement your own LLM model class by inheriting the BaseService class in the ufo/llm/base.py file. We leave a PlaceHolderService class in the ufo/llm/placeholder.py file as an example. You must implement the chat_completion method in your LLM model class to accept a list of messages and return a list of completions for each message. def chat_completion( self, messages, n, temperature: Optional[float] = None, max_tokens: Optional[int] = None, top_p: Optional[float] = None, **kwargs: Any, ): \"\"\" Generates completions for a given list of messages. Args: messages (List[str]): The list of messages to generate completions for. n (int): The number of completions to generate for each message. temperature (float, optional): Controls the randomness of the generated completions. Higher values (e.g., 0.8) make the completions more random, while lower values (e.g., 0.2) make the completions more focused and deterministic. If not provided, the default value from the model configuration will be used. max_tokens (int, optional): The maximum number of tokens in the generated completions. If not provided, the default value from the model configuration will be used. top_p (float, optional): Controls the diversity of the generated completions. Higher values (e.g., 0.8) make the completions more diverse, while lower values (e.g., 0.2) make the completions more focused. If not provided, the default value from the model configuration will be used. **kwargs: Additional keyword arguments to be passed to the underlying completion method. Returns: List[str], None:A list of generated completions for each message and the cost set to be None. Raises: Exception: If an error occurs while making the API request. \"\"\" pass","title":"Step 2"},{"location":"supported_models/custom_model/#step-3","text":"After implementing the LLM model class, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the custom LLM model. The following is an example configuration for the custom LLM model: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"custom_model\" , # The API type, \"openai\" for the OpenAI API, \"aoai\" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API. API_BASE: \"YOUR_ENDPOINT\", # The custom LLM API address. API_MODEL: \"YOUR_MODEL\", # The custom LLM model name.","title":"Step 3"},{"location":"supported_models/custom_model/#step-4","text":"After configuring the HOST_AGENT and APP_AGENT with the custom LLM model, you can start using UFO to interact with the custom LLM model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Step 4"},{"location":"supported_models/gemini/","text":"Google Gemini Step 1 To use the Google Gemini API, you need to create an account on the Google Gemini website and access the API key. Step 2 You may need to install additional dependencies to use the Google Gemini API. You can install the dependencies using the following command: pip install -U google-generativeai==0.7.0 Step 3 Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Google Gemini API. The following is an example configuration for the Google Gemini API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"Gemini\" , API_KEY: \"YOUR_KEY\", API_MODEL: \"YOUR_MODEL\" Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. Tip API_MODEL is the model name of Gemini LLM API. You can find the model name in the Gemini LLM model list. If you meet the 429 Resource has been exhausted (e.g. check quota)., it may because the rate limit of your Gemini API. Step 4 After configuring the HOST_AGENT and APP_AGENT with the Gemini API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Gemini"},{"location":"supported_models/gemini/#google-gemini","text":"","title":"Google Gemini"},{"location":"supported_models/gemini/#step-1","text":"To use the Google Gemini API, you need to create an account on the Google Gemini website and access the API key.","title":"Step 1"},{"location":"supported_models/gemini/#step-2","text":"You may need to install additional dependencies to use the Google Gemini API. You can install the dependencies using the following command: pip install -U google-generativeai==0.7.0","title":"Step 2"},{"location":"supported_models/gemini/#step-3","text":"Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Google Gemini API. The following is an example configuration for the Google Gemini API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"Gemini\" , API_KEY: \"YOUR_KEY\", API_MODEL: \"YOUR_MODEL\" Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. Tip API_MODEL is the model name of Gemini LLM API. You can find the model name in the Gemini LLM model list. If you meet the 429 Resource has been exhausted (e.g. check quota)., it may because the rate limit of your Gemini API.","title":"Step 3"},{"location":"supported_models/gemini/#step-4","text":"After configuring the HOST_AGENT and APP_AGENT with the Gemini API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Step 4"},{"location":"supported_models/ollama/","text":"Ollama Step 1 If you want to use the Ollama model, Go to Ollama and follow the instructions to serve a LLM model on your local environment. We provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates. ## Install ollama on Linux & WSL2 curl https://ollama.ai/install.sh | sh ## Run the serving ollama serve Step 2 Open another terminal and run the following command to test the ollama model: ollama run YOUR_MODEL Info When serving LLMs via Ollama, it will by default start a server at http://localhost:11434 , which will later be used as the API base in config.yaml . Step 3 After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Ollama API. The following is an example configuration for the Ollama API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"Ollama\" , API_BASE: \"YOUR_ENDPOINT\", API_MODEL: \"YOUR_MODEL\" Tip API_BASE is the URL started in the Ollama LLM server and API_MODEL is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model token limitations, you can use lite version of prompt to have a taste on UFO which can be configured in config_dev.yaml . Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. Step 4 After configuring the HOST_AGENT and APP_AGENT with the Ollama API, you can start using UFO to interact with the Ollama API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Ollama"},{"location":"supported_models/ollama/#ollama","text":"","title":"Ollama"},{"location":"supported_models/ollama/#step-1","text":"If you want to use the Ollama model, Go to Ollama and follow the instructions to serve a LLM model on your local environment. We provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates. ## Install ollama on Linux & WSL2 curl https://ollama.ai/install.sh | sh ## Run the serving ollama serve","title":"Step 1"},{"location":"supported_models/ollama/#step-2","text":"Open another terminal and run the following command to test the ollama model: ollama run YOUR_MODEL Info When serving LLMs via Ollama, it will by default start a server at http://localhost:11434 , which will later be used as the API base in config.yaml .","title":"Step 2"},{"location":"supported_models/ollama/#step-3","text":"After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Ollama API. The following is an example configuration for the Ollama API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"Ollama\" , API_BASE: \"YOUR_ENDPOINT\", API_MODEL: \"YOUR_MODEL\" Tip API_BASE is the URL started in the Ollama LLM server and API_MODEL is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model token limitations, you can use lite version of prompt to have a taste on UFO which can be configured in config_dev.yaml . Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs.","title":"Step 3"},{"location":"supported_models/ollama/#step-4","text":"After configuring the HOST_AGENT and APP_AGENT with the Ollama API, you can start using UFO to interact with the Ollama API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Step 4"},{"location":"supported_models/openai/","text":"OpenAI Step 1 To use the OpenAI API, you need to create an account on the OpenAI website . After creating an account, you can access the API key from the API keys page . Step 2 After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the OpenAI API. The following is an example configuration for the OpenAI API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"openai\" , # The API type, \"openai\" for the OpenAI API, \"aoai\" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API. API_BASE: \"https://api.openai.com/v1/chat/completions\", # The the OpenAI API endpoint, \"https://api.openai.com/v1/chat/completions\" for the OpenAI API. API_KEY: \"sk-\", # The OpenAI API key, begin with sk- API_VERSION: \"2024-02-15-preview\", # The version of the API, \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model name, \"gpt-4-vision-preview\" by default. You may also use \"gpt-4o\" for using the GPT-4O model. Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. You can find the list of models here . Step 3 After configuring the HOST_AGENT and APP_AGENT with the OpenAI API, you can start using UFO to interact with the OpenAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"OpenAI"},{"location":"supported_models/openai/#openai","text":"","title":"OpenAI"},{"location":"supported_models/openai/#step-1","text":"To use the OpenAI API, you need to create an account on the OpenAI website . After creating an account, you can access the API key from the API keys page .","title":"Step 1"},{"location":"supported_models/openai/#step-2","text":"After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the OpenAI API. The following is an example configuration for the OpenAI API: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"openai\" , # The API type, \"openai\" for the OpenAI API, \"aoai\" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API. API_BASE: \"https://api.openai.com/v1/chat/completions\", # The the OpenAI API endpoint, \"https://api.openai.com/v1/chat/completions\" for the OpenAI API. API_KEY: \"sk-\", # The OpenAI API key, begin with sk- API_VERSION: \"2024-02-15-preview\", # The version of the API, \"2024-02-15-preview\" by default API_MODEL: \"gpt-4-vision-preview\", # The OpenAI model name, \"gpt-4-vision-preview\" by default. You may also use \"gpt-4o\" for using the GPT-4O model. Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. You can find the list of models here .","title":"Step 2"},{"location":"supported_models/openai/#step-3","text":"After configuring the HOST_AGENT and APP_AGENT with the OpenAI API, you can start using UFO to interact with the OpenAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Step 3"},{"location":"supported_models/overview/","text":"Supported Models UFO supports a variety of LLM models and APIs. You can customize the model and API used by the HOST_AGENT and APP_AGENT in the config.yaml file. Additionally, you can configure a BACKUP_AGENT to handle requests when the primary agent fails to respond. Please refer to the following sections for more information on the supported models and APIs: LLMs Documentation OPENAI OpenAI API Azure OpenAI (AOAI) Azure OpenAI API Gemini Gemini API Claude Claude API QWEN QWEN API Ollama Ollama API Custom Custom API Info Each model is implemented as a separate class in the ufo/llm directory, and uses the functions chat_completion defined in the BaseService class of the ufo/llm/base.py file to obtain responses from the model.","title":"Overview"},{"location":"supported_models/overview/#supported-models","text":"UFO supports a variety of LLM models and APIs. You can customize the model and API used by the HOST_AGENT and APP_AGENT in the config.yaml file. Additionally, you can configure a BACKUP_AGENT to handle requests when the primary agent fails to respond. Please refer to the following sections for more information on the supported models and APIs: LLMs Documentation OPENAI OpenAI API Azure OpenAI (AOAI) Azure OpenAI API Gemini Gemini API Claude Claude API QWEN QWEN API Ollama Ollama API Custom Custom API Info Each model is implemented as a separate class in the ufo/llm directory, and uses the functions chat_completion defined in the BaseService class of the ufo/llm/base.py file to obtain responses from the model.","title":"Supported Models"},{"location":"supported_models/qwen/","text":"Qwen Model Step 1 Qwen (Tongyi Qianwen) is developed by Alibaba DAMO Academy. To use the Qwen model, Go to QWen and register an account and get the API key. More details can be found here (in Chinese). Step 2 You may need to install additional dependencies to use the Qwen model. You can install the dependencies using the following command: pip install dashscope Step 3 Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Qwen model. The following is an example configuration for the Qwen model: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"qwen\" , # The API type, \"qwen\" for the Qwen model. API_KEY: \"YOUR_KEY\", # The Qwen API key API_MODEL: \"YOUR_MODEL\" # The Qwen model name Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. Tip API_MODEL is the model name of Qwen LLM API. You can find the model name in the Qwen LLM model list. Step 4 After configuring the HOST_AGENT and APP_AGENT with the Qwen model, you can start using UFO to interact with the Qwen model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Qwen"},{"location":"supported_models/qwen/#qwen-model","text":"","title":"Qwen Model"},{"location":"supported_models/qwen/#step-1","text":"Qwen (Tongyi Qianwen) is developed by Alibaba DAMO Academy. To use the Qwen model, Go to QWen and register an account and get the API key. More details can be found here (in Chinese).","title":"Step 1"},{"location":"supported_models/qwen/#step-2","text":"You may need to install additional dependencies to use the Qwen model. You can install the dependencies using the following command: pip install dashscope","title":"Step 2"},{"location":"supported_models/qwen/#step-3","text":"Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml ) to use the Qwen model. The following is an example configuration for the Qwen model: VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions API_TYPE: \"qwen\" , # The API type, \"qwen\" for the Qwen model. API_KEY: \"YOUR_KEY\", # The Qwen API key API_MODEL: \"YOUR_MODEL\" # The Qwen model name Tip If you set VISUAL_MODE to True , make sure the API_MODEL supports visual inputs. Tip API_MODEL is the model name of Qwen LLM API. You can find the model name in the Qwen LLM model list.","title":"Step 3"},{"location":"supported_models/qwen/#step-4","text":"After configuring the HOST_AGENT and APP_AGENT with the Qwen model, you can start using UFO to interact with the Qwen model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.","title":"Step 4"}]} \ No newline at end of file diff --git a/search/worker.js b/search/worker.js new file mode 100644 index 00000000..8628dbce --- /dev/null +++ b/search/worker.js @@ -0,0 +1,133 @@ +var base_path = 'function' === typeof importScripts ? '.' : '/search/'; +var allowSearch = false; +var index; +var documents = {}; +var lang = ['en']; +var data; + +function getScript(script, callback) { + console.log('Loading script: ' + script); + $.getScript(base_path + script).done(function () { + callback(); + }).fail(function (jqxhr, settings, exception) { + console.log('Error: ' + exception); + }); +} + +function getScriptsInOrder(scripts, callback) { + if (scripts.length === 0) { + callback(); + return; + } + getScript(scripts[0], function() { + getScriptsInOrder(scripts.slice(1), callback); + }); +} + +function loadScripts(urls, callback) { + if( 'function' === typeof importScripts ) { + importScripts.apply(null, urls); + callback(); + } else { + getScriptsInOrder(urls, callback); + } +} + +function onJSONLoaded () { + data = JSON.parse(this.responseText); + var scriptsToLoad = ['lunr.js']; + if (data.config && data.config.lang && data.config.lang.length) { + lang = data.config.lang; + } + if (lang.length > 1 || lang[0] !== "en") { + scriptsToLoad.push('lunr.stemmer.support.js'); + if (lang.length > 1) { + scriptsToLoad.push('lunr.multi.js'); + } + if (lang.includes("ja") || lang.includes("jp")) { + scriptsToLoad.push('tinyseg.js'); + } + for (var i=0; i < lang.length; i++) { + if (lang[i] != 'en') { + scriptsToLoad.push(['lunr', lang[i], 'js'].join('.')); + } + } + } + loadScripts(scriptsToLoad, onScriptsLoaded); +} + +function onScriptsLoaded () { + console.log('All search scripts loaded, building Lunr index...'); + if (data.config && data.config.separator && data.config.separator.length) { + lunr.tokenizer.separator = new RegExp(data.config.separator); + } + + if (data.index) { + index = lunr.Index.load(data.index); + data.docs.forEach(function (doc) { + documents[doc.location] = doc; + }); + console.log('Lunr pre-built index loaded, search ready'); + } else { + index = lunr(function () { + if (lang.length === 1 && lang[0] !== "en" && lunr[lang[0]]) { + this.use(lunr[lang[0]]); + } else if (lang.length > 1) { + this.use(lunr.multiLanguage.apply(null, lang)); // spread operator not supported in all browsers: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_operator#Browser_compatibility + } + this.field('title'); + this.field('text'); + this.ref('location'); + + for (var i=0; i < data.docs.length; i++) { + var doc = data.docs[i]; + this.add(doc); + documents[doc.location] = doc; + } + }); + console.log('Lunr index built, search ready'); + } + allowSearch = true; + postMessage({config: data.config}); + postMessage({allowSearch: allowSearch}); +} + +function init () { + var oReq = new XMLHttpRequest(); + oReq.addEventListener("load", onJSONLoaded); + var index_path = base_path + '/search_index.json'; + if( 'function' === typeof importScripts ){ + index_path = 'search_index.json'; + } + oReq.open("GET", index_path); + oReq.send(); +} + +function search (query) { + if (!allowSearch) { + console.error('Assets for search still loading'); + return; + } + + var resultDocuments = []; + var results = index.search(query); + for (var i=0; i < results.length; i++){ + var result = results[i]; + doc = documents[result.ref]; + doc.summary = doc.text.substring(0, 200); + resultDocuments.push(doc); + } + return resultDocuments; +} + +if( 'function' === typeof importScripts ) { + onmessage = function (e) { + if (e.data.init) { + init(); + } else if (e.data.query) { + postMessage({ results: search(e.data.query) }); + } else { + console.error("Worker - Unrecognized message: " + e); + } + }; +} diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 00000000..0f8724ef --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz new file mode 100644 index 00000000..614cde08 Binary files /dev/null and b/sitemap.xml.gz differ diff --git a/supported_models/azure_openai/index.html b/supported_models/azure_openai/index.html new file mode 100644 index 00000000..a10575f1 --- /dev/null +++ b/supported_models/azure_openai/index.html @@ -0,0 +1,340 @@ + + + + + + + + Azure OpenAI - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Azure OpenAI (AOAI)

+

Step 1

+

To use the Azure OpenAI API, you need to create an account on the Azure OpenAI website. After creating an account, you can deploy the AOAI API and access the API key.

+

Step 2

+

After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml) to use the Azure OpenAI API. The following is an example configuration for the Azure OpenAI API:

+
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "aoai" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.  
+API_BASE: "YOUR_ENDPOINT", #  The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
+API_KEY: "YOUR_KEY",  # The aoai API key
+API_VERSION: "2024-02-15-preview", # The version of the API, "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview",  # The OpenAI model name, "gpt-4-vision-preview" by default. You may also use "gpt-4o" for using the GPT-4O model.
+API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
+
+

If you want to use AAD for authentication, you should also set the following configuration:

+
    AAD_TENANT_ID: "YOUR_TENANT_ID", # Set the value to your tenant id for the llm model
+    AAD_API_SCOPE: "YOUR_SCOPE", # Set the value to your scope for the llm model
+    AAD_API_SCOPE_BASE: "YOUR_SCOPE_BASE" # Set the value to your scope base for the llm model, whose format is API://YOUR_SCOPE_BASE, and the only need is the YOUR_SCOPE_BASE
+
+
+

Tip

+

If you set VISUAL_MODE to True, make sure the API_DEPLOYMENT_ID supports visual inputs.

+
+

Step 3

+

After configuring the HOST_AGENT and APP_AGENT with the OpenAI API, you can start using UFO to interact with the AOAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/supported_models/claude/index.html b/supported_models/claude/index.html new file mode 100644 index 00000000..ecd3c44e --- /dev/null +++ b/supported_models/claude/index.html @@ -0,0 +1,342 @@ + + + + + + + + Claude - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Anthropic Claude

+

Step 1

+

To use the Claude API, you need to create an account on the Claude website and access the API key.

+

Step 2

+

You may need to install additional dependencies to use the Claude API. You can install the dependencies using the following command:

+
pip install -U anthropic==0.37.1
+
+

Step 3

+

Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml) to use the Claude API. The following is an example configuration for the Claude API:

+
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "Claude" ,
+API_KEY: "YOUR_KEY",  
+API_MODEL: "YOUR_MODEL"
+
+
+

Tip

+

If you set VISUAL_MODE to True, make sure the API_MODEL supports visual inputs.

+
+
+

Tip

+

API_MODEL is the model name of Claude LLM API. You can find the model name in the Claude LLM model list.

+
+

Step 4

+

After configuring the HOST_AGENT and APP_AGENT with the Claude API, you can start using UFO to interact with the Claude API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/supported_models/custom_model/index.html b/supported_models/custom_model/index.html new file mode 100644 index 00000000..ee1d1d67 --- /dev/null +++ b/supported_models/custom_model/index.html @@ -0,0 +1,358 @@ + + + + + + + + Custom Model - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Customized LLM Models

+

We support and welcome the integration of custom LLM models in UFO. If you have a custom LLM model that you would like to use with UFO, you can follow the steps below to configure the model in UFO.

+

Step 1

+

Create a custom LLM model and serve it on your local environment.

+

Step 2

+

Create a python script under the ufo/llm directory, and implement your own LLM model class by inheriting the BaseService class in the ufo/llm/base.py file. We leave a PlaceHolderService class in the ufo/llm/placeholder.py file as an example. You must implement the chat_completion method in your LLM model class to accept a list of messages and return a list of completions for each message.

+
def chat_completion(
+    self,
+    messages,
+    n,
+    temperature: Optional[float] = None,
+    max_tokens: Optional[int] = None,
+    top_p: Optional[float] = None,
+    **kwargs: Any,
+):
+    """
+    Generates completions for a given list of messages.
+    Args:
+        messages (List[str]): The list of messages to generate completions for.
+        n (int): The number of completions to generate for each message.
+        temperature (float, optional): Controls the randomness of the generated completions. Higher values (e.g., 0.8) make the completions more random, while lower values (e.g., 0.2) make the completions more focused and deterministic. If not provided, the default value from the model configuration will be used.
+        max_tokens (int, optional): The maximum number of tokens in the generated completions. If not provided, the default value from the model configuration will be used.
+        top_p (float, optional): Controls the diversity of the generated completions. Higher values (e.g., 0.8) make the completions more diverse, while lower values (e.g., 0.2) make the completions more focused. If not provided, the default value from the model configuration will be used.
+        **kwargs: Additional keyword arguments to be passed to the underlying completion method.
+    Returns:
+        List[str], None:A list of generated completions for each message and the cost set to be None.
+    Raises:
+        Exception: If an error occurs while making the API request.
+    """
+    pass
+
+

Step 3

+

After implementing the LLM model class, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml) to use the custom LLM model. The following is an example configuration for the custom LLM model:

+
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "custom_model" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.  
+API_BASE: "YOUR_ENDPOINT", #  The custom LLM API address.
+API_MODEL: "YOUR_MODEL",  # The custom LLM model name.
+
+

Step 4

+

After configuring the HOST_AGENT and APP_AGENT with the custom LLM model, you can start using UFO to interact with the custom LLM model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/supported_models/gemini/index.html b/supported_models/gemini/index.html new file mode 100644 index 00000000..80f73ee1 --- /dev/null +++ b/supported_models/gemini/index.html @@ -0,0 +1,342 @@ + + + + + + + + Gemini - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Google Gemini

+

Step 1

+

To use the Google Gemini API, you need to create an account on the Google Gemini website and access the API key.

+

Step 2

+

You may need to install additional dependencies to use the Google Gemini API. You can install the dependencies using the following command:

+
pip install -U google-generativeai==0.7.0
+
+

Step 3

+

Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml) to use the Google Gemini API. The following is an example configuration for the Google Gemini API:

+
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "Gemini" ,
+API_KEY: "YOUR_KEY",  
+API_MODEL: "YOUR_MODEL"
+
+
+

Tip

+

If you set VISUAL_MODE to True, make sure the API_MODEL supports visual inputs.

+
+
+

Tip

+

API_MODEL is the model name of Gemini LLM API. You can find the model name in the Gemini LLM model list. If you meet the 429 Resource has been exhausted (e.g. check quota)., it may because the rate limit of your Gemini API.

+
+

Step 4

+

After configuring the HOST_AGENT and APP_AGENT with the Gemini API, you can start using UFO to interact with the Gemini API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/supported_models/ollama/index.html b/supported_models/ollama/index.html new file mode 100644 index 00000000..dcfc0e06 --- /dev/null +++ b/supported_models/ollama/index.html @@ -0,0 +1,351 @@ + + + + + + + + Ollama - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Ollama

+

Step 1

+

If you want to use the Ollama model, Go to Ollama and follow the instructions to serve a LLM model on your local environment. We provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates.

+
## Install ollama on Linux & WSL2
+curl https://ollama.ai/install.sh | sh
+## Run the serving
+ollama serve
+
+

Step 2

+

Open another terminal and run the following command to test the ollama model:

+
ollama run YOUR_MODEL
+
+
+

Info

+

When serving LLMs via Ollama, it will by default start a server at http://localhost:11434, which will later be used as the API base in config.yaml.

+
+

Step 3

+

After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml) to use the Ollama API. The following is an example configuration for the Ollama API:

+
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "Ollama" ,
+API_BASE: "YOUR_ENDPOINT",   
+API_MODEL: "YOUR_MODEL"
+
+
+

Tip

+

API_BASE is the URL started in the Ollama LLM server and API_MODEL is the model name of Ollama LLM, it should be same as the one you served before. In addition, due to model token limitations, you can use lite version of prompt to have a taste on UFO which can be configured in config_dev.yaml.

+
+
+

Tip

+

If you set VISUAL_MODE to True, make sure the API_MODEL supports visual inputs.

+
+

Step 4

+

After configuring the HOST_AGENT and APP_AGENT with the Ollama API, you can start using UFO to interact with the Ollama API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/supported_models/openai/index.html b/supported_models/openai/index.html new file mode 100644 index 00000000..81065b60 --- /dev/null +++ b/supported_models/openai/index.html @@ -0,0 +1,334 @@ + + + + + + + + OpenAI - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

OpenAI

+

Step 1

+

To use the OpenAI API, you need to create an account on the OpenAI website. After creating an account, you can access the API key from the API keys page.

+

Step 2

+

After obtaining the API key, you can configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml) to use the OpenAI API. The following is an example configuration for the OpenAI API:

+
VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+API_TYPE: "openai" , # The API type, "openai" for the OpenAI API, "aoai" for the AOAI API, 'azure_ad' for the ad authority of the AOAI API.  
+API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint, "https://api.openai.com/v1/chat/completions" for the OpenAI API.
+API_KEY: "sk-",  # The OpenAI API key, begin with sk-
+API_VERSION: "2024-02-15-preview", # The version of the API, "2024-02-15-preview" by default
+API_MODEL: "gpt-4-vision-preview",  # The OpenAI model name, "gpt-4-vision-preview" by default. You may also use "gpt-4o" for using the GPT-4O model.
+
+
+

Tip

+

If you set VISUAL_MODE to True, make sure the API_MODEL supports visual inputs. You can find the list of models here.

+
+

Step 3

+

After configuring the HOST_AGENT and APP_AGENT with the OpenAI API, you can start using UFO to interact with the OpenAI API for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/supported_models/overview/index.html b/supported_models/overview/index.html new file mode 100644 index 00000000..3b1c03da --- /dev/null +++ b/supported_models/overview/index.html @@ -0,0 +1,355 @@ + + + + + + + + Overview - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Supported Models

+

UFO supports a variety of LLM models and APIs. You can customize the model and API used by the HOST_AGENT and APP_AGENT in the config.yaml file. Additionally, you can configure a BACKUP_AGENT to handle requests when the primary agent fails to respond.

+

Please refer to the following sections for more information on the supported models and APIs:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LLMsDocumentation
OPENAIOpenAI API
Azure OpenAI (AOAI)Azure OpenAI API
GeminiGemini API
ClaudeClaude API
QWENQWEN API
OllamaOllama API
CustomCustom API
+
+

Info

+

Each model is implemented as a separate class in the ufo/llm directory, and uses the functions chat_completion defined in the BaseService class of the ufo/llm/base.py file to obtain responses from the model.

+
+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + + diff --git a/supported_models/qwen/index.html b/supported_models/qwen/index.html new file mode 100644 index 00000000..ba91501b --- /dev/null +++ b/supported_models/qwen/index.html @@ -0,0 +1,342 @@ + + + + + + + + Qwen - UFO Documentation + + + + + + + + + + + + + + +
+ + +
+ +
+
+
    +
  • + + +
  • +
  • +
+
+
+
+
+ +

Qwen Model

+

Step 1

+

Qwen (Tongyi Qianwen) is developed by Alibaba DAMO Academy. To use the Qwen model, Go to QWen and register an account and get the API key. More details can be found here (in Chinese).

+

Step 2

+

You may need to install additional dependencies to use the Qwen model. You can install the dependencies using the following command:

+
pip install dashscope
+
+

Step 3

+

Configure the HOST_AGENT and APP_AGENT in the config.yaml file (rename the config_template.yaml file to config.yaml) to use the Qwen model. The following is an example configuration for the Qwen model:

+
    VISUAL_MODE: True, # Whether to use visual mode to understand screenshots and take actions
+    API_TYPE: "qwen" , # The API type, "qwen" for the Qwen model.
+    API_KEY: "YOUR_KEY",  # The Qwen API key
+    API_MODEL: "YOUR_MODEL"  # The Qwen model name
+
+
+

Tip

+

If you set VISUAL_MODE to True, make sure the API_MODEL supports visual inputs.

+
+
+

Tip

+

API_MODEL is the model name of Qwen LLM API. You can find the model name in the Qwen LLM model list.

+
+

Step 4

+

After configuring the HOST_AGENT and APP_AGENT with the Qwen model, you can start using UFO to interact with the Qwen model for various tasks on Windows OS. Please refer to the Quick Start Guide for more details on how to get started with UFO.

+ +
+
+ +
+
+ +
+ +
+ +
+ + + + « Previous + + + Next » + + +
+ + + + + + + + +