Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Leaderboard Update - 2024/12/29 (Checkpoint 0cea216) #845

Merged
merged 6 commits into from
Dec 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 89 additions & 80 deletions data_live.csv
Original file line number Diff line number Diff line change
@@ -1,81 +1,90 @@
Rank,Model,Live Overall Acc,AST Summary,Python Simple AST,Python Multiple AST,Python Parallel AST,Python Parallel Multiple AST,Irrelevance Detection,Relevance Detection
1,GPT-4o-2024-08-06 (Prompt),80.84%,74.02%,78.68%,72.46%,100.00%,75.00%,91.84%,52.94%
2,GPT-4-turbo-2024-04-09 (FC),79.56%,78.09%,81.01%,77.59%,81.25%,66.67%,81.97%,70.59%
3,GPT-4o-2024-08-06 (FC),79.29%,76.02%,76.36%,76.07%,81.25%,66.67%,84.47%,70.59%
4,Claude-3-Opus-20240229 (FC),79.24%,76.31%,79.84%,77.40%,18.75%,29.17%,83.79%,76.47%
5,ToolACE-8B (FC),78.37%,75.72%,72.48%,76.54%,81.25%,70.83%,82.43%,77.78%
6,Gemini-1.5-Flash-002 (FC),77.96%,70.91%,71.71%,70.47%,81.25%,75.00%,89.12%,58.82%
7,Claude-3.5-Sonnet-20241022 (FC),77.96%,79.50%,82.17%,81.10%,31.25%,12.50%,75.74%,70.59%
8,o1-mini-2024-09-12 (Prompt),77.73%,71.87%,72.87%,71.70%,75.00%,66.67%,87.07%,58.82%
9,Mistral-Medium-2312 (Prompt),77.20%,73.50%,74.03%,73.69%,81.25%,54.17%,83.11%,64.71%
10,GPT-4o-mini-2024-07-18 (Prompt),77.20%,77.42%,79.84%,76.73%,93.75%,70.83%,76.76%,82.35%
11,palmyra-x-004 (FC),77.16%,74.69%,75.19%,75.21%,50.00%,62.50%,81.07%,70.59%
12,Gemini-1.5-Pro-002 (Prompt),76.76%,78.61%,81.01%,77.97%,93.75%,70.83%,73.92%,76.47%
13,Gemini-1.5-Pro-001 (Prompt),76.49%,72.98%,75.58%,71.98%,93.75%,75.00%,82.31%,52.94%
14,Functionary-Medium-v3.1 (FC),76.45%,82.16%,81.78%,82.62%,68.75%,75.00%,67.80%,72.22%
15,Gemini-1.5-Pro-002 (FC),76.44%,76.31%,79.07%,75.50%,87.50%,75.00%,76.64%,76.47%
16,Gemini-1.5-Flash-001 (FC),75.51%,72.69%,72.09%,73.31%,62.50%,58.33%,80.16%,58.82%
17,Gemini-1.5-Pro-001 (FC),75.47%,70.69%,73.26%,70.18%,81.25%,58.33%,83.11%,58.82%
18,o1-preview-2024-09-12 (Prompt),75.29%,77.57%,82.17%,76.35%,81.25%,79.17%,71.54%,88.24%
19,Gemini-1.5-Flash-002 (Prompt),75.20%,74.76%,77.13%,74.26%,93.75%,58.33%,75.62%,88.24%
20,Qwen2.5-72B-Instruct (Prompt),75.03%,81.79%,84.11%,81.67%,62.50%,75.00%,64.29%,94.44%
21,GoGoAgent,74.84%,72.61%,74.81%,72.08%,81.25%,66.67%,77.89%,94.12%
22,DeepSeek-Coder-V2 (FC),73.43%,77.50%,80.62%,77.30%,50.00%,70.83%,67.01%,83.33%
23,xLAM-8x22b-r (FC),73.39%,80.46%,83.33%,80.15%,62.50%,75.00%,62.24%,88.89%
24,GPT-4o-mini-2024-07-18 (FC),73.24%,75.20%,75.19%,75.12%,87.50%,70.83%,70.07%,82.35%
25,Functionary-Small-v3.1 (FC),72.99%,77.35%,78.68%,77.49%,75.00%,58.33%,66.10%,83.33%
26,Mistral-small-2402 (FC),72.49%,68.91%,64.34%,72.17%,12.50%,12.50%,77.78%,82.35%
27,Claude-3.5-Sonnet-20241022 (Prompt),71.96%,80.90%,86.05%,80.44%,81.25%,45.83%,58.16%,76.47%
28,Hammer2.0-7b (FC),71.75%,77.28%,75.97%,77.59%,81.25%,75.00%,62.81%,94.44%
29,xLAM-8x7b-r (FC),71.08%,76.68%,72.48%,78.16%,62.50%,66.67%,62.02%,94.44%
30,claude-3.5-haiku-20241022 (Prompt),70.24%,75.20%,81.01%,73.98%,87.50%,58.33%,62.47%,77.78%
31,mistral-large-2407 (FC),69.73%,79.42%,85.66%,78.16%,68.75%,75.00%,54.76%,76.47%
32,MiniCPM3-4B-FC (FC),69.66%,65.06%,72.87%,63.63%,37.50%,62.50%,76.53%,77.78%
33,FireFunction-v1 (FC),69.56%,69.13%,68.99%,71.79%,0.00%,0.00%,69.73%,94.12%
34,xLAM-7b-r (FC),69.35%,73.35%,71.32%,74.45%,50.00%,62.50%,62.70%,94.44%
35,Open-Mixtral-8x22b (FC),68.71%,72.46%,75.19%,73.41%,6.25%,45.83%,62.70%,82.35%
36,GPT-3.5-Turbo-0125 (Prompt),68.62%,77.94%,78.29%,78.25%,75.00%,62.50%,53.85%,94.12%
37,Gemini-1.5-Flash-001 (Prompt),68.53%,75.87%,74.81%,75.78%,93.75%,79.17%,57.03%,82.35%
38,Command-R-Plus (Prompt) (Original),68.27%,76.09%,75.58%,76.26%,81.25%,70.83%,56.01%,82.35%
39,Gemini-1.0-Pro-002 (FC),68.00%,66.17%,73.26%,65.53%,37.50%,37.50%,70.63%,76.47%
40,Gemma-2-9b-it (Prompt),67.61%,73.72%,73.26%,74.26%,56.25%,66.67%,58.05%,77.78%
41,Qwen2.5-7B-Instruct (Prompt),66.95%,74.24%,74.81%,74.45%,62.50%,66.67%,55.44%,83.33%
42,Claude-3-Opus-20240229 (Prompt),66.80%,79.27%,84.11%,78.73%,75.00%,54.17%,47.39%,82.35%
43,GLM-4-9b-Chat (FC),66.50%,63.58%,71.32%,64.10%,0.00%,0.00%,70.98%,66.67%
44,FireFunction-v2 (FC),66.44%,75.20%,76.74%,75.50%,56.25%,58.33%,52.61%,88.24%
45,Gemma-2-27b-it (Prompt),66.15%,78.61%,83.33%,78.06%,68.75%,58.33%,46.60%,88.89%
46,Open-Mixtral-8x22b (Prompt),65.82%,74.46%,80.62%,72.84%,81.25%,75.00%,52.27%,82.35%
47,Open-Mistral-Nemo-2407 (FC),65.16%,69.73%,75.19%,68.28%,75.00%,70.83%,58.16%,64.71%
48,Meta-Llama-3-70B-Instruct (Prompt),65.04%,78.83%,81.01%,78.54%,75.00%,70.83%,43.31%,94.44%
49,Hammer2.0-1.5b (FC),64.86%,69.43%,74.03%,68.47%,56.25%,70.83%,57.48%,83.33%
50,Hermes-2-Pro-Llama-3-8B (FC),64.59%,65.95%,69.77%,65.53%,56.25%,50.00%,62.93%,44.44%
51,Claude-3-Haiku-20240307 (Prompt),64.22%,74.17%,77.13%,74.17%,56.25%,54.17%,48.87%,70.59%
52,GPT-4-turbo-2024-04-09 (Prompt),63.56%,84.68%,86.05%,84.24%,100.00%,79.17%,30.50%,100.00%
53,GPT-3.5-Turbo-0125 (FC),62.98%,77.50%,77.91%,78.35%,50.00%,54.17%,40.14%,94.12%
54,Llama-3.1-70B-Instruct (Prompt),62.02%,76.17%,77.52%,75.97%,87.50%,62.50%,39.68%,94.44%
55,Llama-3.1-8B-Instruct (Prompt),60.68%,71.95%,73.26%,72.36%,56.25%,50.00%,43.20%,72.22%
56,DBRX-Instruct (Prompt),60.58%,73.65%,77.52%,73.31%,75.00%,45.83%,39.91%,94.12%
57,Open-Mixtral-8x7b (Prompt),60.53%,64.03%,60.85%,65.05%,68.75%,50.00%,54.65%,88.24%
58,Qwen2.5-1.5B-Instruct (Prompt),60.46%,60.25%,68.60%,58.50%,56.25%,50.00%,60.43%,77.78%
59,Claude-3-Haiku-20240307 (FC),59.51%,75.80%,79.07%,77.87%,0.00%,0.00%,33.79%,100.00%
60,Granite-20b-FunctionCalling (FC),59.22%,57.66%,67.44%,55.56%,43.75%,54.17%,61.00%,88.89%
61,Command-R-Plus (FC) (Original),58.89%,62.69%,68.60%,61.82%,50.00%,45.83%,52.27%,100.00%
62,Mistral-Small-2402 (Prompt),58.18%,56.70%,34.50%,64.20%,0.00%,4.17%,60.43%,58.82%
63,Hermes-2-Pro-Mistral-7B (FC),57.49%,61.07%,67.44%,60.11%,50.00%,41.67%,51.81%,66.67%
64,Llama-3.2-3B-Instruct (Prompt),55.53%,63.29%,63.18%,64.39%,18.75%,45.83%,43.08%,83.33%
65,Nexusflow-Raven-v2 (FC),54.22%,39.45%,41.47%,38.75%,56.25%,37.50%,76.76%,58.82%
66,MiniCPM3-4B (Prompt),54.20%,36.64%,45.35%,34.19%,43.75%,45.83%,81.07%,55.56%
67,xLAM-7b-fc-r (FC),54.02%,60.47%,78.29%,57.36%,31.25%,25.00%,43.65%,77.78%
68,Hammer2.0-0.5b (FC),53.22%,45.82%,51.94%,44.25%,56.25%,41.67%,64.17%,72.22%
69,mistral-large-2407 (Prompt),52.62%,82.83%,86.05%,81.96%,93.75%,79.17%,5.44%,100.00%
70,Qwen2-7B-Instruct (Prompt),50.56%,60.47%,56.20%,61.73%,37.50%,66.67%,34.69%,83.33%
71,Gemini-1.0-Pro-002 (Prompt),48.80%,46.85%,48.06%,46.53%,62.50%,37.50%,51.13%,82.35%
72,Open-Mistral-Nemo-2407 (Prompt),48.67%,74.17%,77.13%,73.31%,87.50%,70.83%,8.73%,94.12%
73,Meta-Llama-3-8B-Instruct (Prompt),47.76%,60.47%,59.30%,61.73%,37.50%,33.33%,27.66%,77.78%
74,Llama-3.1-70B-Instruct (FC),45.27%,51.96%,51.94%,52.90%,31.25%,25.00%,33.90%,100.00%
75,Gemma-2-2b-it (Prompt),43.40%,19.47%,26.74%,18.42%,0.00%,0.00%,80.16%,38.89%
76,DeepSeek-Coder-V2-Lite-Instruct (FC),39.63%,3.55%,1.94%,3.70%,6.25%,12.50%,95.58%,5.56%
77,Qwen2-1.5B-Instruct (Prompt),38.34%,40.49%,47.67%,39.41%,18.75%,25.00%,34.13%,83.33%
78,xLAM-1b-fc-r (FC),37.54%,54.33%,65.89%,53.56%,0.00%,0.00%,10.54%,100.00%
79,Llama-3.1-8B-Instruct (FC),33.19%,48.56%,50.00%,48.62%,37.50%,37.50%,8.39%,94.44%
80,Llama-3.2-1B-Instruct (Prompt),31.36%,11.92%,30.62%,7.50%,12.50%,4.17%,61.11%,33.33%
1,GPT-4-turbo-2024-04-09 (FC),80.45%,79.42%,83.33%,78.63%,81.25%,70.83%,82.20%,72.22%
2,o1-2024-12-17 (Prompt),80.45%,77.50%,81.78%,76.54%,81.25%,70.83%,85.15%,72.22%
3,gpt-4o-2024-11-20 (Prompt),79.65%,80.46%,83.72%,79.77%,87.50%,70.83%,78.34%,83.33%
4,gpt-4o-2024-11-20 (FC),79.61%,79.27%,81.01%,78.82%,87.50%,75.00%,80.05%,83.33%
5,Claude-3.5-Sonnet-20241022 (FC),78.85%,80.46%,83.33%,81.96%,25.00%,20.83%,76.42%,77.78%
6,ToolACE-8B (FC),78.50%,75.87%,72.48%,76.73%,81.25%,70.83%,82.43%,83.33%
7,o1-mini-2024-09-12 (Prompt),78.05%,71.80%,71.71%,71.60%,75.00%,79.17%,87.98%,61.11%
8,Gemini-1.5-Flash-002 (FC),77.97%,70.84%,72.09%,70.18%,81.25%,79.17%,89.34%,55.56%
9,Claude-3-Opus-20240229 (FC),77.92%,74.98%,77.91%,75.78%,31.25%,37.50%,82.77%,61.11%
10,o1-2024-12-17 (FC),77.92%,77.05%,81.01%,79.01%,0.00%,0.00%,79.37%,72.22%
11,watt-tool-70B (FC),77.65%,83.42%,84.88%,83.48%,81.25%,66.67%,68.48%,94.44%
12,Mistral-Medium-2312 (Prompt),77.52%,74.02%,75.19%,74.07%,81.25%,54.17%,83.11%,66.67%
13,Gemini-1.5-Pro-001 (Prompt),76.63%,73.06%,75.97%,71.98%,93.75%,75.00%,82.54%,55.56%
14,Functionary-Medium-v3.1 (FC),76.59%,82.53%,81.01%,83.29%,68.75%,75.00%,67.57%,72.22%
15,Gemini-1.5-Flash-002 (Prompt),76.54%,76.98%,80.62%,76.16%,93.75%,62.50%,75.74%,83.33%
16,Gemini-1.5-Pro-002 (Prompt),76.54%,78.39%,81.78%,77.40%,87.50%,79.17%,73.81%,72.22%
17,watt-tool-8B (FC),76.37%,77.13%,75.97%,77.49%,87.50%,66.67%,75.06%,83.33%
18,GPT-4o-mini-2024-07-18 (Prompt),76.32%,77.57%,80.23%,76.73%,93.75%,75.00%,74.26%,83.33%
19,Gemini-1.5-Flash-001 (FC),76.28%,74.02%,75.19%,74.26%,62.50%,58.33%,80.27%,50.00%
20,Gemini-1.5-Pro-001 (FC),76.23%,71.65%,75.58%,70.75%,81.25%,62.50%,83.79%,50.00%
21,Gemini-1.5-Pro-002 (FC),76.19%,76.17%,79.46%,75.21%,87.50%,75.00%,76.30%,72.22%
22,Qwen2.5-72B-Instruct (Prompt),75.21%,82.24%,84.50%,82.15%,62.50%,75.00%,63.95%,100.00%
23,xLAM-7b-r (FC),75.08%,73.72%,71.32%,74.93%,50.00%,62.50%,86.72%,94.44%
24,Hammer2.1-7b (FC),75.02%,77.05%,76.36%,77.40%,81.25%,66.67%,71.77%,82.35%
25,GPT-4o-mini-2024-07-18 (FC),74.37%,76.61%,78.29%,76.16%,87.50%,70.83%,70.75%,83.33%
26,Qwen2.5-32B-Instruct (Prompt),74.14%,78.68%,82.17%,78.54%,62.50%,58.33%,66.67%,100.00%
27,Qwen2.5-14B-Instruct (Prompt),74.10%,75.13%,74.03%,75.78%,62.50%,66.67%,72.45%,77.78%
28,GoGoAgent,73.92%,74.54%,72.09%,75.40%,68.75%,66.67%,72.90%,77.78%
29,Hammer2.1-3b (FC),73.91%,72.83%,72.48%,73.31%,62.50%,62.50%,75.40%,82.35%
30,Functionary-Small-v3.1 (FC),73.66%,78.09%,79.07%,78.16%,81.25%,62.50%,66.78%,77.78%
31,DeepSeek-Coder-V2 (FC),73.43%,77.13%,80.23%,77.02%,43.75%,70.83%,67.46%,88.89%
32,xLAM-8x22b-r (FC),72.55%,79.57%,79.46%,79.68%,81.25%,75.00%,61.45%,88.89%
33,claude-3.5-haiku-20241022 (FC),72.28%,76.98%,82.17%,78.35%,18.75%,0.00%,64.85%,83.33%
34,Mistral-small-2402 (FC),72.10%,68.47%,64.73%,71.51%,12.50%,12.50%,77.55%,77.78%
35,Claude-3.5-Sonnet-20241022 (Prompt),71.88%,80.61%,86.05%,80.06%,81.25%,45.83%,58.39%,77.78%
36,xLAM-8x7b-r (FC),70.99%,77.50%,74.03%,79.30%,43.75%,58.33%,60.54%,94.44%
37,claude-3.5-haiku-20241022 (Prompt),70.64%,76.46%,83.72%,75.02%,87.50%,54.17%,61.56%,77.78%
38,Hammer2.1-1.5b (FC),70.59%,69.65%,70.93%,69.80%,50.00%,62.50%,71.88%,77.78%
39,FireFunction-v1 (FC),70.41%,70.47%,71.32%,72.93%,0.00%,0.00%,69.84%,94.44%
40,MiniCPM3-4B-FC (FC),69.97%,65.66%,74.42%,63.91%,43.75%,62.50%,76.53%,72.22%
41,mistral-large-2407 (FC),69.84%,79.57%,84.88%,78.54%,62.50%,79.17%,54.88%,72.22%
42,Gemini-1.0-Pro-002 (FC),69.57%,68.69%,77.13%,67.62%,43.75%,41.67%,70.98%,66.67%
43,Command R7B (FC),69.21%,59.66%,63.18%,58.69%,56.25%,66.67%,84.13%,55.56%
44,Gemini-1.5-Flash-001 (Prompt),68.86%,76.54%,76.74%,76.16%,93.75%,79.17%,56.80%,83.33%
45,Open-Mixtral-8x22b (FC),68.55%,72.46%,76.36%,73.12%,6.25%,45.83%,62.24%,83.33%
46,GPT-3.5-Turbo-0125 (Prompt),68.46%,78.46%,79.84%,78.63%,75.00%,58.33%,52.61%,94.44%
47,DeepSeek-V3 (FC),68.33%,81.94%,82.95%,82.15%,81.25%,62.50%,47.05%,88.89%
48,Gemma-2-9b-it (Prompt),67.84%,74.32%,76.36%,74.26%,62.50%,62.50%,57.60%,83.33%
49,Qwen2.5-7B-Instruct (Prompt),67.35%,74.91%,75.97%,74.93%,62.50%,70.83%,55.33%,88.89%
50,Gemma-2-27b-it (Prompt),67.04%,79.94%,84.50%,79.39%,68.75%,62.50%,46.71%,94.44%
51,Claude-3-Opus-20240229 (Prompt),66.86%,79.50%,84.11%,79.11%,68.75%,54.17%,47.17%,83.33%
52,GLM-4-9b-Chat (FC),66.77%,63.95%,72.09%,64.39%,0.00%,0.00%,71.09%,66.67%
53,Open-Mixtral-8x22b (Prompt),65.93%,74.61%,82.17%,72.65%,81.25%,75.00%,52.27%,83.33%
54,Open-Mistral-Nemo-2407 (FC),65.93%,71.06%,77.13%,69.61%,75.00%,66.67%,58.05%,66.67%
55,FireFunction-v2 (FC),65.57%,77.94%,78.29%,78.35%,56.25%,70.83%,46.03%,94.44%
56,Ministral-8B-Instruct-2410 (FC),64.93%,72.61%,75.19%,72.27%,62.50%,66.67%,53.06%,70.59%
57,Hermes-2-Pro-Llama-3-8B (FC),64.90%,66.54%,71.71%,65.81%,56.25%,50.00%,62.81%,44.44%
58,Meta-Llama-3-70B-Instruct (Prompt),64.90%,78.46%,80.62%,78.25%,75.00%,66.67%,43.42%,100.00%
59,GPT-3.5-Turbo-0125 (FC),63.93%,79.05%,80.62%,79.68%,43.75%,58.33%,40.14%,94.44%
60,GPT-4-turbo-2024-04-09 (Prompt),63.71%,84.75%,87.21%,84.14%,100.00%,75.00%,30.73%,100.00%
61,Hammer2.1-0.5b (FC),62.86%,58.03%,59.69%,58.02%,50.00%,45.83%,69.95%,77.78%
62,Llama-3.3-70B-Instruct (Prompt),62.59%,77.72%,80.62%,77.11%,93.75%,62.50%,38.66%,100.00%
63,Llama-3.1-70B-Instruct (Prompt),62.06%,76.24%,77.13%,76.16%,87.50%,62.50%,39.57%,100.00%
64,Open-Mixtral-8x7b (Prompt),61.39%,65.28%,63.18%,66.10%,68.75%,50.00%,54.88%,88.89%
65,Qwen2.5-1.5B-Instruct (Prompt),61.04%,60.99%,70.16%,59.26%,56.25%,41.67%,60.66%,83.33%
66,Llama-3.1-8B-Instruct (Prompt),60.95%,72.69%,73.26%,73.31%,56.25%,50.00%,42.63%,77.78%
67,DBRX-Instruct (Prompt),60.15%,73.28%,77.13%,73.03%,75.00%,41.67%,39.34%,94.44%
68,Granite-20b-FunctionCalling (FC),59.57%,58.33%,67.83%,56.32%,43.75%,54.17%,60.88%,88.89%
69,Command-R-Plus (FC),58.91%,60.70%,69.77%,58.78%,62.50%,45.83%,55.90%,72.22%
70,Mistral-Small-2402 (Prompt),58.73%,57.88%,36.05%,65.24%,0.00%,8.33%,60.32%,44.44%
71,Qwen2.5-3B-Instruct (Prompt),58.60%,66.77%,68.99%,66.48%,56.25%,62.50%,45.46%,88.89%
72,Hermes-2-Pro-Mistral-7B (FC),57.62%,61.21%,68.99%,60.02%,43.75%,41.67%,51.93%,66.67%
73,Llama-3.2-3B-Instruct (Prompt),55.75%,63.66%,63.57%,64.86%,12.50%,45.83%,42.97%,88.89%
74,MiniCPM3-4B (Prompt),54.46%,37.23%,46.51%,34.76%,43.75%,41.67%,80.95%,50.00%
75,Nexusflow-Raven-v2 (FC),54.15%,39.38%,41.47%,38.65%,56.25%,37.50%,76.64%,61.11%
76,xLAM-7b-fc-r (FC),53.35%,60.99%,78.29%,58.02%,31.25%,25.00%,41.16%,77.78%
77,mistral-large-2407 (Prompt),52.69%,82.68%,85.27%,81.96%,93.75%,79.17%,5.78%,100.00%
78,Qwen2-7B-Instruct (Prompt),50.60%,60.77%,56.59%,62.01%,37.50%,66.67%,34.24%,88.89%
79,Gemini-1.0-Pro-002 (Prompt),49.09%,47.52%,50.39%,47.01%,62.50%,29.17%,50.91%,77.78%
80,Open-Mistral-Nemo-2407 (Prompt),48.96%,74.98%,77.13%,74.45%,87.50%,66.67%,8.28%,88.89%
81,Meta-Llama-3-8B-Instruct (Prompt),47.93%,60.55%,60.85%,61.44%,37.50%,33.33%,28.00%,77.78%
82,Llama-3.1-70B-Instruct (FC),44.96%,51.74%,51.94%,52.61%,31.25%,25.00%,33.45%,100.00%
83,Gemma-2-2b-it (Prompt),43.76%,19.47%,26.36%,18.52%,0.00%,0.00%,81.07%,38.89%
84,DeepSeek-Coder-V2-Lite-Instruct (FC),39.40%,3.55%,2.33%,3.80%,0.00%,8.33%,95.12%,0.00%
85,Qwen2-1.5B-Instruct (Prompt),39.00%,41.23%,48.45%,40.27%,12.50%,25.00%,34.47%,94.44%
86,xLAM-1b-fc-r (FC),36.92%,53.89%,63.95%,53.37%,6.25%,0.00%,9.64%,100.00%
87,Llama-3.1-8B-Instruct (FC),33.45%,49.22%,51.55%,49.00%,37.50%,41.67%,8.05%,94.44%
88,Qwen2.5-0.5B-Instruct (Prompt),31.59%,38.34%,53.88%,34.76%,56.25%,16.67%,19.95%,94.44%
89,Llama-3.2-1B-Instruct (Prompt),31.36%,12.14%,31.40%,7.60%,12.50%,4.17%,60.66%,38.89%
Loading