Skip to content

Commit

Permalink
new paper reading
Browse files Browse the repository at this point in the history
  • Loading branch information
radarFudan committed Oct 11, 2024
1 parent eccff52 commit 5d1337b
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 8 deletions.
5 changes: 3 additions & 2 deletions _posts/2022-10-29-paper-reading-DDPM-DDIM-stable-diffusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: post
title: "paper reading: What is diffusion model?"
date: 2022-10-29
categories:
categories: Research
mathjax: true
---

Expand Down Expand Up @@ -88,4 +88,5 @@ Based on score approach, there is a text-guided generation which generate the da

3. https://github.com/acids-ircam/diffusion_models

4.
4.

14 changes: 8 additions & 6 deletions _posts/2023-06-01-state-space-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,22 @@
layout: post
title: "paper reading: Can state-space model make RNN great again?"
date: 2023-06-01
categories:
categories: Research
mathjax: true
---

## Paper List

[Hippo]()
[HiPPO](https://arxiv.org/abs/2008.07669)

[S4]()
[S4](https://arxiv.org/abs/2111.00396)

[Hungry Hungry Hippo]()
[Hungry Hungry Hippos](https://arxiv.org/abs/2212.14052)

[Hyena]()
[Hyena](https://arxiv.org/abs/2302.10866)

[Mamba](https://arxiv.org/abs/2312.00752)

## Reference

Annotated S4
[Annotated S4](https://iclr-blog-track.github.io/2022/03/25/annotated-s4/)
54 changes: 54 additions & 0 deletions _posts/2024-10-10-verifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
layout: post
title: "paper reading: Verifier"
date: 2024-10-10
categories: Research
mathjax: true
---

## Paper List

Cobbe:
[Training Verifiers to Solve Math Word Problems](https://arxiv.org/abs/2110.14168)

Lightman:
[Let's Verify Step by Step](https://arxiv.org/abs/2305.20050)

## Main contributions

### Cobbe:

1. We introduce [GSM8K](https://huggingface.co/datasets/openai/gsm8k), a dataset of 8.5K high quality linguistically diverse grade school math word problems.
- GitHub: https://github.com/openai/grade-school-math
- Benchmark accuracy has rised from 75% in April to 96.7% in recent Qwen2: https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k

2. We propose training verifiers to judge the correctness of model completions.
- **Best-of-N**: At test time, we generate many candidate solutions and select the one ranked highest by the verifier.
- We show that, compared to a finetuning baseline, the use of verifiers results in approximately the same performance boost as a **30x model size increase**, and that verifiers scale significantly better with increased data.
- On the full dataset, 6B verification slightly outperforms a finetuned 175B model, thereby offering a boost approximately equivalent to a 30x model size increase.

3. We show that dropout acts as a strong regularizer, significantly improving performance for both finetuning and verification.

4. Dataset design principles:
1. High Quality
2. High Diversity
3. Moderate Difficulty: We choose a problem distribution that is challenging for large state-of-the-art language models, without being completely intractable. GSM8K will help us better understand the data scaling trends of different models and methods in this difficulty sweet spot. Problems require no concepts beyond the level of early Algebra, and the vast majority of problems can be solved without explicitly defining a variable.
4. Natural Language Solutions

5. We use test@N to denote the percentage of problems solved correctly at least once when allowing the model to make N separate guesses for each problem.
- T=0 for test@1; T=0.7 for test@100; Both temperature values were chosen empirically to produce the best results.

6. Dataset:
1. Each problem is solved twice in labeling.

### Lightman:

1. We also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.

2. We show that process supervision can train much more reliable reward models than outcome supervision.

3.

## Reference


0 comments on commit 5d1337b

Please sign in to comment.