Preference Dissection

This is the official repository for Dissecting Human and LLM Preferencces

News

[2024/05/15] Our paper has been accepted by ACL 2024! 🎉
[2024/02/20] We released the paper, code, dataset and an interactive demo for this project.

Introduction

In this project, we conduct a thorough analysis of human and LLM preferences. Our analysis is based on user-model conversations collected from various real-world scenarios, and we dissect both human preferences and LLM preferences from 32 different models.

Here are some key findings:

Humans are less sensitive to errors, clearly dislike a model when it admits its limits, and prefer a response that supports their stances.
Advanced LLMs like GPT-4-Turbo prefer correctness, clarity, and harmlessness more.
LLMs of similar sizes exhibit similar preferences irrespective of training methods, and the preference of a pretrained-only LLM is largely unchanged after alignment.
Benchmarks with LLM-as-a-judge are easy to manipulate. Experiments on AlpacaEval 2.0 and MT-Bench show that aligning models with the judges' preferences increases scores, whereas diverge from these preferences leads to lower scores.

Visualization

Run Locally

You can run the visualization codes locally by following the steps below:

First enter the visualization directory:

cd ./visualization

Install the required packages:

pip install -r requirements.txt

Then

streamlit run app.py

The following visualizations are provided:

Complete Preference Dissection in Paper: shows how the difference of properties in a pair of responses can influence different LLMs'(human included) preference.
Preference Similarity Matrix: shows the preference similarity among different judges.
Sample-level SHAP Analysis: applies shapley value to show how the difference of properties in a pair of responses affect the final preference.

You can also add a new model to obtain its preference labels and the dissection. Please follow the steps in below:

Step 1: Inference

python add_new_model_p1_inference.py \
--model_name "your_model_name" \
--model_path "/path/to/model" \
--model_size 6.5 # the size of your model, in billion parameters (B)

Then run again with the order of two response interchanged (by setting --change_AB):

python add_new_model_p1_inference.py \
--model_name "your_model_name" \
--model_path "/path/to/model" \
--model_size 6.5 # the size of your model, in billion parameters (B)
--change_AB

Step 2: Collect Preference Labels

python add_new_model_p2_collection.py \
--model_name "your_model_name" \
--model_size 6.5 # same as above

Step 3: Bayesian Logistic Regression

You may wait for a little while for fitting the Bayesian Logistic models.

python add_new_model_p3_bayeslr.py \
--model_name "your_model_name" \
--model_size 6.5 # same as above

Once the above steps are done, you can rerun the visualization codes to see the new model's preference labels and the dissection.

Online Interactive Demo

We also provide an online interactive demo in Huggingface Spaces, which you can directly play with.

Resources

We release a bunch of collected resources for this project:

The Annotated Dataset

We provide the annotated dataset used in this project. The dataset is based on lmsys/chatbot_arena_conversations, and contains how each response satisfies the 29 pre-defined properties. Please see more details in the dataset page.

Following the original dataset, the annotated dataset is licensed under CC-BY-NC-4.0.

Prompts for Annotation

We provide the prompts used in the annotation process in prompts/, including the pairwise annotation prompts, as well as the unused single response annotation prompts.

In the following, we provide an example guide for the annotation process.

Step 0: Configure your Environment

pip install -r requirements.txt

Step 1: Prepare GPT-4-Turbo References

Note: Set the api_base and api_key in the program before you run it.

python annotation_codes/collect_gpt4turbo_ref.py

You may get the reference file in raw_data/gpt4turbo_references.jsonl.

Step 2: Annotation

Note: Set the api_base and api_key in the program before you run it.

python annotation_codes/annotate.py

You may get the annotation results in annotation_results/

Sometimes the api call may fail and the output field in the annotation results becomes "Failed!". Then you can use the following code to retry the failed ones.

python annotation_codes/fix_annotation.py

Step 3: Resolve the Annotations

python annotation_codes/resolve_collected_data.py

You will get the resolved annotation results in resolved_annotations/.

Citation

If you find this project useful or use any of the released resources, please kindly cite our paper:

@article{li2024dissecting,
  title={Dissecting Human and LLM Preferences},
  author={Li, Junlong and Zhou, Fan and Sun, Shichao and Zhang, Yikai and Zhao, Hai and Liu, Pengfei},
  journal={arXiv preprint arXiv:2402.11296},
  year={2024}
}

Acknowledgement

We thank Yuan Guo, Yiheng Xu, Yuqing Yang, Zhoujun Cheng, Zhihui Xie for their valuable feedback and suggestions! 🤗🤗🤗

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
annotation_codes		annotation_codes
assets		assets
prompts		prompts
raw_data		raw_data
visualization		visualization
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preference Dissection

News

Introduction

Visualization

Run Locally

Online Interactive Demo

Resources

The Annotated Dataset

Prompts for Annotation

Citation

Acknowledgement

About

Releases

Packages

Languages

GAIR-NLP/Preference-Dissection

Folders and files

Latest commit

History

Repository files navigation

Preference Dissection

News

Introduction

Visualization

Run Locally

Online Interactive Demo

Resources

The Annotated Dataset

Prompts for Annotation

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages