Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
inference.py		inference.py

README.md

Eurus

This is an example of using Eurus-RM (Yuan et al., 2024) to perform best-of-N sampling with Llama-3 8B as the base model.

Introduction

Eurus-RM-7B is trained on a mixture of UltraInteract, UltraFeedback, and UltraSafety, with a specifically designed reward modeling objective for reasoning to directly increase.

EURUS-RM-7B stands out as the best 7B RM overall and achieves similar or better performance than much larger baselines. Particularly, it outperforms GPT-4 in certain tasks.

Running the example

Prerequisites:

Download LLama-3 8B model.
Have 2 * 24 GB GPUs.

Script:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node 1 examples/Eurus/inference.py --model_dir $LLAMA3_CKPTS --best_of_n 10

Results

We tested the performance of using Eurus-RM-7B to select the best of 10 reasoning chains generated by Llama-3 8B on GSM8k.

Method	Accuracy
CoT (Llama-8B)	0.487
CoT (Llama-8B) +Best-of-10 (Eurus-RM-7B)	0.726

Reference

@article{yuan2024advancing,
  title={Advancing LLM Reasoning Generalists with Preference Trees},
  author={Yuan, Lifan and Cui, Ganqu and Wang, Hanbin and Ding, Ning and Wang, Xingyao and Deng, Jia and Shan, Boji and Chen, Huimin and Xie, Ruobing and Lin, Yankai and others},
  journal={arXiv preprint arXiv:2404.02078},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eurus

Eurus

README.md

Eurus

Introduction

Running the example

Results

Reference

Files

Eurus

Directory actions

More options

Directory actions

More options

Latest commit

History

Eurus

Folders and files

parent directory

README.md

Eurus

Introduction

Running the example

Results

Reference