Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] some anomalies about the dataset #820

Open
zhangch-ss opened this issue Dec 9, 2024 · 6 comments · Fixed by #826
Open

[BFCL] some anomalies about the dataset #820

zhangch-ss opened this issue Dec 9, 2024 · 6 comments · Fixed by #826
Labels
BFCL-Dataset BFCL Dataset-Related Issue

Comments

@zhangch-ss
Copy link
Contributor

Describe the issue
I found some inconsistencies between enumeration values ​​and parameter types in the tool parameter descriptions.
type integer but enum is str
{"id": "live_simple_178-103-1", "question": [[{"role": "user", "content": "Help find a housekeeper who provides ironing services."}]], "function": [{"name": "get_service_id", "description": "Retrieve the unique identifier for a specific service type, such as cleaning or ironing services.", "parameters": {"type": "dict", "required": ["service_id"], "properties": {"service_id": {"type": "integer", "description": "The unique identifier of the service. For example, '1' represents cleaning service, and '2' represents ironing service.", "enum": ["1", "2"]}, "unit": {"type": "integer", "description": "The unit of measurement for the quantity of service, expressed as the number of service sessions or instances.", "default": 1}}}}]}
it might caused some function call error, like this
image

@CharlieJCJ
Copy link
Collaborator

CharlieJCJ commented Dec 10, 2024

Thanks for bringing up this issue, this is indeed an oversight.

Just for others to reference here, we follow Open API format, by definition, if the type is integer, the enum values should be integers as well. Ref: https://www.speakeasy.com/openapi/schemas/enums

The corrected data point is
{"id": "live_simple_178-103-1", "question": [[{"role": "user", "content": "Help find a housekeeper who provides ironing services."}]], "function": [{"name": "get_service_id", "description": "Retrieve the unique identifier for a specific service type, such as cleaning or ironing services.", "parameters": {"type": "dict", "required": ["service_id"], "properties": {"service_id": {"type": "integer", "description": "The unique identifier of the service. For example, '1' represents cleaning service, and '2' represents ironing service.", "enum": [1, 2]}, "unit": {"type": "integer", "description": "The unit of measurement for the quantity of service, expressed as the number of service sessions or instances.", "default": 1}}}}]}

@zhangch-ss
Copy link
Contributor Author

Hi, I want to say that there are more than one similar errors.
I wrote a script to identify them. If possible, I would like to submit a PR to fix all of them

@CharlieJCJ
Copy link
Collaborator

CharlieJCJ commented Dec 10, 2024

Hey @zhangch-ss that would be awesome, I also noticed this, and turned the recent PR to draft.

We welcome your PR for fixing all of them. Just a note, if possible, please also share the script that identifies them for others reproducibility.

Thanks again!

@HuanzhiMao HuanzhiMao added the BFCL-Dataset BFCL Dataset-Related Issue label Dec 10, 2024
@CharlieJCJ
Copy link
Collaborator

You can raise PR either under #824 (recommended), or from your fork

@HuanzhiMao
Copy link
Collaborator

HuanzhiMao commented Dec 10, 2024

You can raise PR either under #824 (recommended), or from your fork

Let's just have her raise a separate PR so that she could be the author, and close #824.

@zhangch-ss
Copy link
Contributor Author

ok, after careful review, I will submit it

HuanzhiMao pushed a commit that referenced this issue Dec 11, 2024
fixed some enum type errors in issue #820

This script is used to check for enmu type errors
```

from glob import glob
import os
import json
import logging
v_type_map = {"int": int,"float": float,"bool": bool,"str": str,"list": list,"dict": dict,
              "tuple": tuple,"boolean": bool,"Boolean": bool,"string": str,"integer": int,
              "number": float,"array": list,"object": dict,"String": str,
}
data_dir = "gorilla/berkeley-function-call-leaderboard/data"
data_path_list = glob(os.path.join(data_dir, "*.json"))
for data_path in data_path_list:
    with open(data_path, "r") as f:
        for line in f:
            data = json.loads(line)
            id = data.get("id")
            functions = data.get("function", "")
            if not functions:
                continue
            for function in functions:
                name = function.get("name", "")
                parameters = function["parameters"]
                properties = parameters["properties"]
                for k, v in properties.items():
                    enum = v.get("enum", [])
                    items = v.get("items", {})
                    if items:
                        v_type = items.get("type", [])
                    else:
                        v_type = v.get("type", [])
                    error_enums = []
                    enum_type = []
                    for e in enum:
                        if not isinstance(e, v_type_map[v_type]):
                            error_enums.append(e)
                            enum_type.append(type(e))
                    if error_enums:
                        logging.error(f"ID: {id} FUN_NAME: {name} \nLABEL_TYPE: {v_type} \nERROR_ENUM: {error_enums} \nENUM_TYPE: {enum_type} \nDETAIL: {v}\n")
```
some errors like the following are not fixed in this PR, which may need
to be fixed in the future
```
ERROR:root:ID: live_irrelevance_626-198-0 FUN_NAME: Travel_1_FindAttractions
LABEL_TYPE: boolean
ERROR_ENUM: ['True', 'False', 'dontcare']
ENUM_TYPE: [<class 'str'>, <class 'str'>, <class 'str'>]
DETAIL: {'type': 'boolean', 'description': "Flag indicating whether the attraction is suitable for children. 'True' if the attraction is kid-friendly, 'False' if it is not, and 'dontcare' for no preference.", 'enum': ['True', 'False', 'dontcare'], 'default': 'dontcare'}
```

---------

Co-authored-by: zhangchuanhui <[email protected]>
@HuanzhiMao HuanzhiMao linked a pull request Dec 11, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-Dataset BFCL Dataset-Related Issue
Projects
None yet
3 participants