Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix some enum type errors in datasets #826

Merged
merged 6 commits into from
Dec 11, 2024
Merged

Conversation

zhangch-ss
Copy link
Contributor

fixed some enum type errors in issue #820
This script is used to check for enmu type errors


from glob import glob
import os
import json
import logging
v_type_map = {"int": int,"float": float,"bool": bool,"str": str,"list": list,"dict": dict,
              "tuple": tuple,"boolean": bool,"Boolean": bool,"string": str,"integer": int,
              "number": float,"array": list,"object": dict,"String": str,
}
data_dir = "gorilla/berkeley-function-call-leaderboard/data"
data_path_list = glob(os.path.join(data_dir, "*.json"))
for data_path in data_path_list:
    with open(data_path, "r") as f:
        for line in f:
            data = json.loads(line)
            id = data.get("id")
            functions = data.get("function", "")
            if not functions:
                continue
            for function in functions:
                name = function.get("name", "")
                parameters = function["parameters"]
                properties = parameters["properties"]
                for k, v in properties.items():
                    enum = v.get("enum", [])
                    items = v.get("items", {})
                    if items:
                        v_type = items.get("type", [])
                    else:
                        v_type = v.get("type", [])
                    error_enums = []
                    enum_type = []
                    for e in enum:
                        if not isinstance(e, v_type_map[v_type]):
                            error_enums.append(e)
                            enum_type.append(type(e))
                    if error_enums:
                        logging.error(f"ID: {id} FUN_NAME: {name} \nLABEL_TYPE: {v_type} \nERROR_ENUM: {error_enums} \nENUM_TYPE: {enum_type} \nDETAIL: {v}\n")

some errors like the following are not fixed in this PR, which may need to be fixed in the future

ERROR:root:ID: live_irrelevance_626-198-0 FUN_NAME: Travel_1_FindAttractions
LABEL_TYPE: boolean
ERROR_ENUM: ['True', 'False', 'dontcare']
ENUM_TYPE: [<class 'str'>, <class 'str'>, <class 'str'>]
DETAIL: {'type': 'boolean', 'description': "Flag indicating whether the attraction is suitable for children. 'True' if the attraction is kid-friendly, 'False' if it is not, and 'dontcare' for no preference.", 'enum': ['True', 'False', 'dontcare'], 'default': 'dontcare'}

@HuanzhiMao HuanzhiMao added the BFCL-Dataset BFCL Dataset-Related Issue label Dec 11, 2024
Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @zhangch-ss!
I will build upon your script and fix the remaining enum issues that have not been addressed in this PR.

@HuanzhiMao HuanzhiMao merged commit 8141b18 into ShishirPatil:main Dec 11, 2024
@HuanzhiMao HuanzhiMao linked an issue Dec 11, 2024 that may be closed by this pull request
@CharlieJCJ
Copy link
Collaborator

Thanks for the PR, looks great! @zhangch-ss

HuanzhiMao added a commit that referenced this pull request Dec 11, 2024
This PR fixed a merge conflict introduced in #826, namely the changes
that were previously introduced to file
`berkeley-function-call-leaderboard/data/BFCL_v3_live_relevance.json`
were not properly incorporated.
HuanzhiMao added a commit that referenced this pull request Dec 31, 2024
This PR updates the leaderboard to reflect the change in score due to
the following PR merge:

1. #822 
2. #826 
3. #829 
4. #832 
5. #837 
6. #840 
7. #835 
8. #842 
9.  #843 
10. #846 
11. #838 
12. #847 
13. #855 
14. #857 

Models were evaluated using checkpoint commit 0cea216.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-Dataset BFCL Dataset-Related Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BFCL] some anomalies about the dataset
3 participants