Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Use N/A in Score Report for Unevaluated Categories #849

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions berkeley-function-call-leaderboard/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

All notable changes to the Berkeley Function Calling Leaderboard will be documented in this file.

- [Dec 21, 2024] [#849](https://github.com/ShishirPatil/gorilla/pull/849): Use `N/A` in score report for unevaluated categories to distinguish from categories where the model actually scored a 0
- [Dec 21, 2024] [#848](https://github.com/ShishirPatil/gorilla/pull/848): Improves behavior for generation and evaluation pipeline. When executable categories are involved and API keys are not provided in the `.env` file, instead of throwing an error, the affected categories will now be skipped. This enhancement provides a smoother experience for first-time users.
- [Dec 21, 2024] [#847](https://github.com/ShishirPatil/gorilla/pull/847): Add new model `watt-ai/watt-tool-8B` and `watt-ai/watt-tool-70B` to the leaderboard.
- [Dec 20, 2024] [#842](https://github.com/ShishirPatil/gorilla/pull/842): Add the following new models to the leaderboard:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -602,7 +602,7 @@ def runner(model_names, test_categories, api_sanity_check, result_dir, score_dir

# This function reads all the score files from local folder and updates the leaderboard table.
# This is helpful when you only want to run the evaluation for a subset of models and test categories.
update_leaderboard_table_with_score_file(LEADERBOARD_TABLE, score_dir)
update_leaderboard_table_with_local_score_file(LEADERBOARD_TABLE, score_dir)
# Write the leaderboard table to a file
generate_leaderboard_csv(LEADERBOARD_TABLE, score_dir, model_names, test_categories)

Expand Down Expand Up @@ -645,7 +645,7 @@ def main(model, test_categories, api_sanity_check, result_dir, score_dir):
skipped_categories.append(test_category)

model_names = None
if model is not None:
if model:
model_names = []
for model_name in model:
# Runner takes in the model name that contains "_", instead of "/", for the sake of file path issues.
Expand Down
Loading