[BFCL] Which test subset corresponds to the multi-step function calling? #836

LeeSureman · 2024-12-17T07:52:52Z

Describe the issue
Congratulations for this great work! I want to conduct the multi-step function-calling on your benchmark. But I do not find the multi-step part, where the model generates one function-call and then gets the call's result and then generates the second function-call.

Specifically, I find this file which introduces all test subset: https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/TEST_CATEGORIES.md

the subset "multiple" seems like multi-step subset. But I see its test examples, and find it seems like only requiring single function-call.

ID datapoint
ID: multiple_174
Query: What is the ranking of Manchester United in Premier League?

ID: multiple_153
Query: When was the signing of the Treaty of Lisbon?

What is the issue
These examples seem like only requiring single function calling

Fanjia-Yan · 2024-12-18T01:35:30Z

We don't have a category dedicated specifically to multi-step function calling setting.

The multiplecategory was released back in February for which all questions have 2 or more function candidates. This is in contrast to simple categories where 1 question corresponds to 1 function. All user requests are achieved with one and only one function call.

multi_turn_ categories were released in October and have multiple questions in line for the LLM to answer sequentially. For each question, it sometimes takes more than 1 function to fulfill the user request, which is what you refer to as multi-step function calling.

In short:

If you want multi-turn + multi-step benchmarks, we have them under categories starting with multi_turn.
If you want single-turn + multi-step benchmarks, we don't have it off the shelf but you can possibly slice some of the multi_turn categories dataset to achieve that.

Thank you!

HuanzhiMao · 2024-12-19T19:26:39Z

Adding on @Fanjia-Yan's comment,

Our current multi-turn categories consist of multiple turns, and each turn may require multiple steps to fulfill a user request. However, this isn't always the case—some turns might just be a single step. If you’re looking for a purely single-turn multi-step dataset, for now, you can extract individual turns from the existing multi-turn entries to create one.

We are also planning to release a dedicated single-turn multi-step category soon. Stay tuned!

LeeSureman · 2024-12-20T13:38:42Z

Many thanks for your quick and detailed response! @Fanjia-Yan @HuanzhiMao

HuanzhiMao added the BFCL-General General BFCL Issue label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Which test subset corresponds to the multi-step function calling? #836

[BFCL] Which test subset corresponds to the multi-step function calling? #836

LeeSureman commented Dec 17, 2024 •

edited

Loading

Fanjia-Yan commented Dec 18, 2024 •

edited

Loading

HuanzhiMao commented Dec 19, 2024

LeeSureman commented Dec 20, 2024

[BFCL] Which test subset corresponds to the multi-step function calling? #836

[BFCL] Which test subset corresponds to the multi-step function calling? #836

Comments

LeeSureman commented Dec 17, 2024 • edited Loading

Fanjia-Yan commented Dec 18, 2024 • edited Loading

HuanzhiMao commented Dec 19, 2024

LeeSureman commented Dec 20, 2024

LeeSureman commented Dec 17, 2024 •

edited

Loading

Fanjia-Yan commented Dec 18, 2024 •

edited

Loading