new paper reading

radarFudan · Oct 11, 2024 · 5d1337b · 5d1337b
1 parent eccff52
commit 5d1337b
Show file tree

Hide file tree

Showing 3 changed files with 65 additions and 8 deletions.
diff --git a/_posts/2022-10-29-paper-reading-DDPM-DDIM-stable-diffusion.md b/_posts/2022-10-29-paper-reading-DDPM-DDIM-stable-diffusion.md
@@ -2,7 +2,7 @@
 layout: post
 title: "paper reading: What is diffusion model?"
 date: 2022-10-29
-categories:
+categories: Research
 mathjax: true
 ---
 
@@ -88,4 +88,5 @@ Based on score approach, there is a text-guided generation which generate the da
 
 3. https://github.com/acids-ircam/diffusion_models
 
-4. 
+4. 
+
diff --git a/_posts/2023-06-01-state-space-model.md b/_posts/2023-06-01-state-space-model.md
@@ -2,20 +2,22 @@
 layout: post
 title: "paper reading: Can state-space model make RNN great again?"
 date: 2023-06-01
-categories:
+categories: Research
 mathjax: true
 ---
 
 ## Paper List
 
-[Hippo]()
+[HiPPO](https://arxiv.org/abs/2008.07669)
 
-[S4]()
+[S4](https://arxiv.org/abs/2111.00396)
 
-[Hungry Hungry Hippo]()
+[Hungry Hungry Hippos](https://arxiv.org/abs/2212.14052)
 
-[Hyena]()
+[Hyena](https://arxiv.org/abs/2302.10866)
+
+[Mamba](https://arxiv.org/abs/2312.00752)
 
 ## Reference
 
-Annotated S4
+[Annotated S4](https://iclr-blog-track.github.io/2022/03/25/annotated-s4/)
diff --git a/_posts/2024-10-10-verifier.md b/_posts/2024-10-10-verifier.md
@@ -0,0 +1,54 @@
+---
+layout: post
+title: "paper reading: Verifier"
+date: 2024-10-10
+categories: Research
+mathjax: true
+---
+
+## Paper List
+
+Cobbe:
+[Training Verifiers to Solve Math Word Problems](https://arxiv.org/abs/2110.14168)
+
+Lightman: 
+[Let's Verify Step by Step](https://arxiv.org/abs/2305.20050)
+
+## Main contributions
+
+### Cobbe:
+
+1. We introduce [GSM8K](https://huggingface.co/datasets/openai/gsm8k), a dataset of 8.5K high quality linguistically diverse grade school math word problems.
+    - GitHub: https://github.com/openai/grade-school-math
+    - Benchmark accuracy has rised from 75% in April to 96.7% in recent Qwen2: https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k
+
+2. We propose training verifiers to judge the correctness of model completions.
+    - **Best-of-N**: At test time, we generate many candidate solutions and select the one ranked highest by the verifier.
+    - We show that, compared to a finetuning baseline, the use of verifiers results in approximately the same performance boost as a **30x model size increase**, and that verifiers scale significantly better with increased data.
+        - On the full dataset, 6B verification slightly outperforms a finetuned 175B model, thereby offering a boost approximately equivalent to a 30x model size increase.
+
+3. We show that dropout acts as a strong regularizer, significantly improving performance for both finetuning and verification.
+
+4. Dataset design principles:
+    1. High Quality
+    2. High Diversity
+    3. Moderate Difficulty: We choose a problem distribution that is challenging for large state-of-the-art language models, without being completely intractable. GSM8K will help us better understand the data scaling trends of different models and methods in this difficulty sweet spot. Problems require no concepts beyond the level of early Algebra, and the vast majority of problems can be solved without explicitly defining a variable.
+    4. Natural Language Solutions
+
+5. We use test@N to denote the percentage of problems solved correctly at least once when allowing the model to make N separate guesses for each problem.
+    - T=0 for test@1; T=0.7 for test@100; Both temperature values were chosen empirically to produce the best results.
+
+6. Dataset:
+    1. Each problem is solved twice in labeling. 
+
+### Lightman:
+
+1. We also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.
+
+2. We show that process supervision can train much more reliable reward models than outcome supervision. 
+
+3. 
+
+## Reference
+
+