From 1f652c3b2b1af39834f1532d5a358ca955085b81 Mon Sep 17 00:00:00 2001 From: Minyang Tian <69544994+mtian8@users.noreply.github.com> Date: Mon, 4 Nov 2024 10:11:16 -0600 Subject: [PATCH] Update leaderboard.md --- docs/leaderboard.md | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/docs/leaderboard.md b/docs/leaderboard.md index 1e30a04..b133034 100644 --- a/docs/leaderboard.md +++ b/docs/leaderboard.md @@ -4,22 +4,25 @@ # SciCode Leaderboard -| Model | Main Problem Resolve Rate | -|------------------------|---------------------------| -| 🥇OpenAI o1-preview | 7.7% | -| 🥈Claude3.5-Sonnet | 4.6% | -| 🥉Deepseek-Coder-v2 | 3.1% | -| GPT-4o | 1.5% | -| GPT-4-Turbo | 1.5% | -| OpenAI o1-mini | 1.5% | -| Gemini 1.5 Pro | 1.5% | -| Claude3-Opus | 1.5% | -| Claude3-Sonnet | 1.5% | -| Qwen2-72B-Instruct | 1.5% | -| Llama-3.1-405B-Instruct| 0% | -| Llama-3.1-70B-Instruct | 0% | -| Mixtral-8x22B-Instruct | 0% | -| Llama-3-70B-Chat | 0% | +| Models | Main Problem Resolve Rate | Subproblem | +|--------------------------|-------------------------------------|-------------------------------------| +| 🥇 OpenAI o1-preview |
7.7
|
28.5
| +| 🥈 Claude3.5-Sonnet |
4.6
|
26.0
| +| 🥉 Claude3.5-Sonnet (new) |
4.6
|
25.3
| +| Deepseek-Coder-v2 |
3.1
|
21.2
| +| GPT-4o |
1.5
|
25.0
| +| GPT-4-Turbo |
1.5
|
22.9
| +| OpenAI o1-mini |
1.5
|
22.2
| +| Gemini 1.5 Pro |
1.5
|
21.9
| +| Claude3-Opus |
1.5
|
21.5
| +| Llama-3.1-405B-Chat |
1.5
|
19.8
| +| Claude3-Sonnet |
1.5
|
17.0
| +| Qwen2-72B-Instruct |
1.5
|
17.0
| +| Llama-3.1-70B-Chat |
0.0
|
17.0
| +| Mixtral-8x22B-Instruct |
0.0
|
16.3
| +| Llama-3-70B-Chat |
0.0
|
14.6
| + +Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.