Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Which test subset corresponds to the multi-step function calling? #836

Open
LeeSureman opened this issue Dec 17, 2024 · 3 comments
Open
Labels
BFCL-General General BFCL Issue

Comments

@LeeSureman
Copy link

LeeSureman commented Dec 17, 2024

Describe the issue
Congratulations for this great work! I want to conduct the multi-step function-calling on your benchmark. But I do not find the multi-step part, where the model generates one function-call and then gets the call's result and then generates the second function-call.

Specifically, I find this file which introduces all test subset: https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/TEST_CATEGORIES.md

the subset "multiple" seems like multi-step subset. But I see its test examples, and find it seems like only requiring single function-call.

ID datapoint
ID: multiple_174
Query: What is the ranking of Manchester United in Premier League?

ID: multiple_153
Query: When was the signing of the Treaty of Lisbon?

What is the issue
These examples seem like only requiring single function calling

@Fanjia-Yan
Copy link
Collaborator

Fanjia-Yan commented Dec 18, 2024

We don't have a category dedicated specifically to multi-step function calling setting.

The multiplecategory was released back in February for which all questions have 2 or more function candidates. This is in contrast to simple categories where 1 question corresponds to 1 function. All user requests are achieved with one and only one function call.

multi_turn_ categories were released in October and have multiple questions in line for the LLM to answer sequentially. For each question, it sometimes takes more than 1 function to fulfill the user request, which is what you refer to as multi-step function calling.

In short:

  • If you want multi-turn + multi-step benchmarks, we have them under categories starting with multi_turn.
  • If you want single-turn + multi-step benchmarks, we don't have it off the shelf but you can possibly slice some of the multi_turn categories dataset to achieve that.

Thank you!

@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Dec 19, 2024
@HuanzhiMao
Copy link
Collaborator

Adding on @Fanjia-Yan's comment,

Our current multi-turn categories consist of multiple turns, and each turn may require multiple steps to fulfill a user request. However, this isn't always the case—some turns might just be a single step. If you’re looking for a purely single-turn multi-step dataset, for now, you can extract individual turns from the existing multi-turn entries to create one.

We are also planning to release a dedicated single-turn multi-step category soon. Stay tuned!

@LeeSureman
Copy link
Author

Many thanks for your quick and detailed response! @Fanjia-Yan @HuanzhiMao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

No branches or pull requests

3 participants