You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue
Congratulations for this great work! I want to conduct the multi-step function-calling on your benchmark. But I do not find the multi-step part, where the model generates one function-call and then gets the call's result and then generates the second function-call.
We don't have a category dedicated specifically to multi-step function calling setting.
The multiplecategory was released back in February for which all questions have 2 or more function candidates. This is in contrast to simple categories where 1 question corresponds to 1 function. All user requests are achieved with one and only one function call.
multi_turn_ categories were released in October and have multiple questions in line for the LLM to answer sequentially. For each question, it sometimes takes more than 1 function to fulfill the user request, which is what you refer to as multi-step function calling.
In short:
If you want multi-turn + multi-step benchmarks, we have them under categories starting with multi_turn.
If you want single-turn + multi-step benchmarks, we don't have it off the shelf but you can possibly slice some of the multi_turn categories dataset to achieve that.
Our current multi-turn categories consist of multiple turns, and each turn may require multiple steps to fulfill a user request. However, this isn't always the case—some turns might just be a single step. If you’re looking for a purely single-turn multi-step dataset, for now, you can extract individual turns from the existing multi-turn entries to create one.
We are also planning to release a dedicated single-turn multi-step category soon. Stay tuned!
Describe the issue
Congratulations for this great work! I want to conduct the multi-step function-calling on your benchmark. But I do not find the multi-step part, where the model generates one function-call and then gets the call's result and then generates the second function-call.
Specifically, I find this file which introduces all test subset: https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/TEST_CATEGORIES.md
the subset "multiple" seems like multi-step subset. But I see its test examples, and find it seems like only requiring single function-call.
ID datapoint
ID: multiple_174
Query: What is the ranking of Manchester United in Premier League?
ID: multiple_153
Query: When was the signing of the Treaty of Lisbon?
What is the issue
These examples seem like only requiring single function calling
The text was updated successfully, but these errors were encountered: