Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

George #174

Merged
merged 3 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ PutnamBench aims to support research in automated mathematical reasoning by prov

PutnamBench includes factored solutions for problems which require exhibiting a numerical answer in addition to its proof of correctness. For these problems, one can attempt two tasks: proving the problem with the numerical answer written into the theorem statement, or additionally producing the answer along with the proof.

We are hosting a [**leaderboard**](https://trishullab.github.io/PutnamBench/leaderboard.html) and will readily receive evaluation results which are accompanied by a preprint or publication. Do **not** include proofs as confirmation in any public setting. Please reach out to us privately for confirmation.
We are hosting a [**leaderboard**](https://trishullab.github.io/PutnamBench/leaderboard.html) and will readily receive evaluation results which are accompanied by a preprint or publication. **Do not** include proofs as confirmation in any public setting. Please reach out privately at `[email protected]` with any requests for additions to the leaderboard.

**We strongly encourage community feedback!** Please let us know if you have any comments for improving PutnamBench. If you notice any mistakes, please raise an issue on the repository and we will address it. We kindly ask that you do not write formal proofs for any of the problems in an effort to reduce contamination.

Expand All @@ -23,7 +23,7 @@ We are hosting a [**leaderboard**](https://trishullab.github.io/PutnamBench/lead
| Isabelle | 640 |
| Coq | 417 |

We also report the number of problems in a certain category. Note that some problems fall under multiple categories. While the categories are intended to capture general features of the problems, we indicate that there is a high variance of problems inside an individual category.
We also report the number of problems in a certain category. Note that some problems fall under multiple categories. While the categories are intended to capture general features of the problems, we also note that there is a high variance of problems inside an individual category.

| Category | Total Quantity |
| ---------------- | -------------- |
Expand Down
13 changes: 8 additions & 5 deletions docs/leaderboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ <h3>📝 Notes</h3>
// ],
// };

const theaders = ["Model", "num-solved"];
const theaders = ["Model", "num-solved", "compute"];

// score: 'average', 'humaneval', 'mbpp', 'humaneval+', 'mbpp+'
const displayTable = (table, score) => {
Expand Down Expand Up @@ -425,18 +425,21 @@ <h3>📝 Notes</h3>
modelCell.appendChild(promptedSymbol);
}
dataRow.appendChild(modelCell);
var passCell = document.createElement("td");
var solvedCell = document.createElement("td");
// if (table == originTable) {
// passCell.classList.add("text-danger");
// } else if (table == plusedTable) {
// passCell.textContent = "⚡";
// passCell.classList.add("text-success");
// }
if (row["num-solved"][score] != "NONE") {
passCell.classList.add("text-success");
passCell.textContent = ""+row["num-solved"][score];
solvedCell.classList.add("text-success");
solvedCell.textContent = ""+row["num-solved"][score];
}
dataRow.appendChild(passCell);
var computeCell = document.createElement("td");
computeCell.textContent = row["condensed-compute-budget"];
dataRow.appendChild(solvedCell);
dataRow.appendChild(computeCell);
tbody.appendChild(dataRow);
});
table.appendChild(tbody);
Expand Down
18 changes: 18 additions & 0 deletions docs/results.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"isabelle-wsolution": 4
},
"size": 0,
"condensed-compute-budget": "pass@10",
"note": "pass@10 w/ T=0.1"
},
"GPT-4o": {
Expand All @@ -17,6 +18,7 @@
"coq-wsolution": 1
},
"size": 0,
"condensed-compute-budget": "pass@10",
"note": "pass@10 w/ T=0.7"
},
"COPRA (GPT-4o)": {
Expand All @@ -27,6 +29,7 @@
"coq-wsolution": 1
},
"size": 0,
"condensed-compute-budget": "pass@1",
"note": "pass@1 w/ T=0.0"
},
"CoqHammer": {
Expand All @@ -36,6 +39,7 @@
"coq-wsolution": 0
},
"size": 0,
"condensed-compute-budget": "pass@1",
"note": "pass@1 w/ t=120s"
},
"Sledgehammer": {
Expand All @@ -45,6 +49,7 @@
"isabelle-wsolution": 3
},
"size": 0,
"condensed-compute-budget": "pass@1",
"note": "pass@1 w/ t=120s"
},
"ReProver w/ retrieval": {
Expand All @@ -54,6 +59,7 @@
"lean-wsolution": 0
},
"size": 0,
"condensed-compute-budget": "pass@1",
"note": "pass@1 w/ t=600s"
},
"ReProver w/o retrieval": {
Expand All @@ -63,6 +69,7 @@
"lean-wsolution": 0
},
"size": 0,
"condensed-compute-budget": "pass@1",
"note": "pass@1 w/ t=600s"
},
"Tactician (LSH)": {
Expand All @@ -72,6 +79,17 @@
"coq-wsolution": 0
},
"size": 0,
"condensed-compute-budget": "pass@1",
"note": "pass@1 w/ t=600s"
},
"InternLM 7B": {
"link": "https://github.com/InternLM/InternLM-Math",
"open-data": "NONE",
"num-solved": {
"lean-wsolution": 4
},
"size": 7,
"condensed-compute-budget": "pass@4096",
"note": "pass@2048 w/ T = 1.0 + pass@2048 w/ T = 0.7"
}
}
4 changes: 2 additions & 2 deletions lean4/src/putnam_1965_a1.lean
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ theorem putnam_1965_a1
(A B C X Y : EuclideanSpace ℝ (Fin 2))
(hABC : ¬Collinear ℝ {A, B, C})
(hangles : ∠ C A B < ∠ B C A ∧ ∠ B C A < π/2 ∧ π/2 < ∠ A B C)
(hX : Collinear ℝ {X, B, C} ∧ ∠ X A B = (π - ∠ C A B)/2 ∧ Euclidean.dist A X = Euclidean.dist A B)
(hY : Collinear ℝ {Y, C, A} ∧ ∠ Y B C = (π - ∠ A B C)/2 ∧ Euclidean.dist B Y = Euclidean.dist A B)
(hX : Collinear ℝ {X, B, C} ∧ ∠ X A B = (π - ∠ C A B)/2 ∧ (A 0 - X 0)^2 + (A 1 - X 1)^2 = (A 0 - B 0)^2 + (A 1 - B 1)^2)
(hY : Collinear ℝ {Y, C, A} ∧ ∠ Y B C = (π - ∠ A B C)/2 ∧ (B 0 - Y 0)^2 + (B 1 - Y 1)^2 = (A 0 - B 0)^2 + (A 1 - B 1)^2)
: ∠ C A B = putnam_1965_a1_solution :=
sorry
6 changes: 3 additions & 3 deletions lean4/src/putnam_2013_a5.lean
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import Mathlib
open BigOperators

open Function Set
open Function Set MeasureTheory

-- Note: uses (Fin m → Fin m → Fin m → ℝ) instead of ensuring inputs are strictly increasing
theorem putnam_2013_a5
Expand All @@ -11,8 +11,8 @@ theorem putnam_2013_a5
(areadef2 : (Fin m → Fin m → Fin m → ℝ) → Prop)
(areadef3 : (Fin m → Fin m → Fin m → ℝ) → Prop)
(mge3 : m ≥ 3)
(harea2 : ∀ a b c : Fin 2 → ℝ, area2 a b c = (MeasureTheory.volume (convexHull ℝ {a, b, c})).toReal)
(harea3 : ∀ a b c : Fin 3 → ℝ, area3 a b c = (MeasureTheory.volume (convexHull ℝ {a, b, c})).toReal)
(harea2 : ∀ a b c : Fin 2 → ℝ, area2 a b c = (volume (convexHull ℝ {a, b, c})).toReal)
(harea3 : ∀ a b c : Fin 3 → ℝ, area3 a b c = (μH[2] (convexHull ℝ {a, b, c})).toReal)
(hareadef2 : ∀ a : Fin m → Fin m → Fin m → ℝ, areadef2 a = ∀ A : Fin m → (Fin 2 → ℝ), (∑ i : Fin m, ∑ j : Fin m, ∑ k : Fin m, if (i < j ∧ j < k) then (a i j k * area2 (A i) (A j) (A k)) else 0) ≥ 0)
(hareadef3 : ∀ a : Fin m → Fin m → Fin m → ℝ, areadef3 a = ∀ A : Fin m → (Fin 3 → ℝ), (∑ i : Fin m, ∑ j : Fin m, ∑ k : Fin m, if (i < j ∧ j < k) then (a i j k * area3 (A i) (A j) (A k)) else 0) ≥ 0)
: ∀ a : Fin m → Fin m → Fin m → ℝ, areadef2 a → areadef3 a :=
Expand Down
Loading