Skip to content

Commit

Permalink
Update leaderboard.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mtian8 authored Nov 4, 2024
1 parent 6b3222f commit 1f652c3
Showing 1 changed file with 19 additions and 16 deletions.
35 changes: 19 additions & 16 deletions docs/leaderboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,25 @@

# SciCode Leaderboard

| Model | Main Problem Resolve Rate |
|------------------------|---------------------------|
| 🥇OpenAI o1-preview | 7.7% |
| 🥈Claude3.5-Sonnet | 4.6% |
| 🥉Deepseek-Coder-v2 | 3.1% |
| GPT-4o | 1.5% |
| GPT-4-Turbo | 1.5% |
| OpenAI o1-mini | 1.5% |
| Gemini 1.5 Pro | 1.5% |
| Claude3-Opus | 1.5% |
| Claude3-Sonnet | 1.5% |
| Qwen2-72B-Instruct | 1.5% |
| Llama-3.1-405B-Instruct| 0% |
| Llama-3.1-70B-Instruct | 0% |
| Mixtral-8x22B-Instruct | 0% |
| Llama-3-70B-Chat | 0% |
| Models | Main Problem Resolve Rate | <span style="background-color:lightgrey">Subproblem</span> |
|--------------------------|-------------------------------------|-------------------------------------|
| 🥇 OpenAI o1-preview | <div align="center">7.7</div> | <div align="center" style="background-color:lightgrey">28.5</div> |
| 🥈 Claude3.5-Sonnet | <div align="center">4.6</div> | <div align="center" style="background-color:lightgrey">26.0</div> |
| 🥉 Claude3.5-Sonnet (new) | <div align="center">4.6</div> | <div align="center" style="background-color:lightgrey">25.3</div> |
| Deepseek-Coder-v2 | <div align="center">3.1</div> | <div align="center" style="background-color:lightgrey">21.2</div> |
| GPT-4o | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">25.0</div> |
| GPT-4-Turbo | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">22.9</div> |
| OpenAI o1-mini | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">22.2</div> |
| Gemini 1.5 Pro | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">21.9</div> |
| Claude3-Opus | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">21.5</div> |
| Llama-3.1-405B-Chat | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">19.8</div> |
| Claude3-Sonnet | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
| Qwen2-72B-Instruct | <div align="center">1.5</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
| Llama-3.1-70B-Chat | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">17.0</div> |
| Mixtral-8x22B-Instruct | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">16.3</div> |
| Llama-3-70B-Chat | <div align="center">0.0</div> | <div align="center" style="background-color:lightgrey">14.6</div> |

Note: If the models tie in the Main Problem resolve rate, we will then compare the Subproblems.

<!-- Once you've added the results to the submission repository,
bring back the table here -->
Expand Down

0 comments on commit 1f652c3

Please sign in to comment.