Skip to content

Latest commit

 

History

History

Eurus

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Eurus

This is an example of using Eurus-RM (Yuan et al., 2024) to perform best-of-N sampling with Llama-3 8B as the base model.

Introduction

Eurus-RM-7B is trained on a mixture of UltraInteract, UltraFeedback, and UltraSafety, with a specifically designed reward modeling objective for reasoning to directly increase.

EURUS-RM-7B stands out as the best 7B RM overall and achieves similar or better performance than much larger baselines. Particularly, it outperforms GPT-4 in certain tasks.

Running the example

Prerequisites:

  • Download LLama-3 8B model.
  • Have 2 * 24 GB GPUs.

Script:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node 1 examples/Eurus/inference.py --model_dir $LLAMA3_CKPTS --best_of_n 10

Results

We tested the performance of using Eurus-RM-7B to select the best of 10 reasoning chains generated by Llama-3 8B on GSM8k.

Method Accuracy
CoT (Llama-8B) 0.487
CoT (Llama-8B) +Best-of-10 (Eurus-RM-7B) 0.726

Reference

@article{yuan2024advancing,
  title={Advancing LLM Reasoning Generalists with Preference Trees},
  author={Yuan, Lifan and Cui, Ganqu and Wang, Hanbin and Ding, Ning and Wang, Xingyao and Deng, Jia and Shan, Boji and Chen, Huimin and Xie, Ruobing and Lin, Yankai and others},
  journal={arXiv preprint arXiv:2404.02078},
  year={2024}
}