-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
eccff52
commit 5d1337b
Showing
3 changed files
with
65 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
layout: post | ||
title: "paper reading: Verifier" | ||
date: 2024-10-10 | ||
categories: Research | ||
mathjax: true | ||
--- | ||
|
||
## Paper List | ||
|
||
Cobbe: | ||
[Training Verifiers to Solve Math Word Problems](https://arxiv.org/abs/2110.14168) | ||
|
||
Lightman: | ||
[Let's Verify Step by Step](https://arxiv.org/abs/2305.20050) | ||
|
||
## Main contributions | ||
|
||
### Cobbe: | ||
|
||
1. We introduce [GSM8K](https://huggingface.co/datasets/openai/gsm8k), a dataset of 8.5K high quality linguistically diverse grade school math word problems. | ||
- GitHub: https://github.com/openai/grade-school-math | ||
- Benchmark accuracy has rised from 75% in April to 96.7% in recent Qwen2: https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k | ||
|
||
2. We propose training verifiers to judge the correctness of model completions. | ||
- **Best-of-N**: At test time, we generate many candidate solutions and select the one ranked highest by the verifier. | ||
- We show that, compared to a finetuning baseline, the use of verifiers results in approximately the same performance boost as a **30x model size increase**, and that verifiers scale significantly better with increased data. | ||
- On the full dataset, 6B verification slightly outperforms a finetuned 175B model, thereby offering a boost approximately equivalent to a 30x model size increase. | ||
|
||
3. We show that dropout acts as a strong regularizer, significantly improving performance for both finetuning and verification. | ||
|
||
4. Dataset design principles: | ||
1. High Quality | ||
2. High Diversity | ||
3. Moderate Difficulty: We choose a problem distribution that is challenging for large state-of-the-art language models, without being completely intractable. GSM8K will help us better understand the data scaling trends of different models and methods in this difficulty sweet spot. Problems require no concepts beyond the level of early Algebra, and the vast majority of problems can be solved without explicitly defining a variable. | ||
4. Natural Language Solutions | ||
|
||
5. We use test@N to denote the percentage of problems solved correctly at least once when allowing the model to make N separate guesses for each problem. | ||
- T=0 for test@1; T=0.7 for test@100; Both temperature values were chosen empirically to produce the best results. | ||
|
||
6. Dataset: | ||
1. Each problem is solved twice in labeling. | ||
|
||
### Lightman: | ||
|
||
1. We also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model. | ||
|
||
2. We show that process supervision can train much more reliable reward models than outcome supervision. | ||
|
||
3. | ||
|
||
## Reference | ||
|
||
|