This is the official repository for Dissecting Human and LLM Preferencces
Interactive Demo | Dataset | Paper | Resources | Citation
- [2024/02/20] We released the paper, code, dataset and an interactive demo for this project.
In this project, we conduct a thorough analysis of human and LLM preferences. Our analysis is based on user-model conversations collected from various real-world scenarios, and we dissect both human preferences and LLM preferences from 32 different models.
Here are some key findings:
- Humans are less sensitive to errors, clearly dislike a model when it admits its limits, and prefer a response that supports their stances.
- Advanced LLMs like GPT-4-Turbo prefer correctness, clarity, and harmlessness more.
- LLMs of similar sizes exhibit similar preferences irrespective of training methods, and the preference of a pretrained-only LLM is largely unchanged after alignment.
- Benchmarks with LLM-as-a-judge are easy to manipulate. Experiments on AlpacaEval 2.0 and MT-Bench show that aligning models with the judges' preferences increases scores, whereas diverge from these preferences leads to lower scores.
We release a bunch of resources for this project:
We provide an interactive demo in Huggingface Spaces. You can play with the demo to see
- Complete Preference Dissection in Paper: shows how the difference of properties in a pair of responses can influence different LLMs'(human included) preference.
- Preference Similarity Matrix: shows the preference similarity among different judges.
- Sample-level SHAP Analysis: applies shapley value to show how the difference of properties in a pair of responses affect the final preference.
- Add a New Model for Preference Dissection: update the preference labels from a new LLM and visualize the results
You can also find the codes of all the analysis in app.py
in the Files
tab (link) of the demo.
We provide the annotated dataset used in this project. The dataset is based on lmsys/chatbot_arena_conversations, and contains how each response satisfies the 29 pre-defined properties. Please see more details in the dataset page.
Following the original dataset, the annotated dataset is licensed under CC-BY-NC-4.0.
We provide the prompts used in the annotation process in prompts/
, including the pairwise annotation prompts, as well as the unused single response annotation prompts.
In the following, we provide an example guide for the annotation process.
Step 0: Configure your Environment
pip install -r requirements.txt
Step 1: Prepare GPT-4-Turbo References
Note: Set the api_base and api_key in the program before you run it.
python annotation_codes/collect_gpt4turbo_ref.py
You may get the reference file in raw_data/gpt4turbo_references.jsonl
.
Step 2: Annotation
Note: Set the api_base and api_key in the program before you run it.
python annotation_codes/annotate.py
You may get the annotation results in annotation_results/
Sometimes the api call may fail and the output
field in the annotation results becomes "Failed!". Then you can use the following code to retry the failed ones.
python annotation_codes/fix_annotation.py
Step 3: Resolve the Annotations
python annotation_codes/resolve_collected_data.py
You will get the resolved annotation results in resolved_annotations/
.
If you find this project useful or use any of the released resources, please kindly cite our paper:
@article{li2024dissecting,
title={Dissecting Human and LLM Preferences},
author={Li, Junlong and Zhou, Fan and Sun, Shichao and Zhang, Yikai and Zhao, Hai and Liu, Pengfei},
journal={arXiv preprint arXiv:2402.11296},
year={2024}
}
We thank Yuan Guo, Yiheng Xu, Yuqing Yang, Zhoujun Cheng, Zhihui Xie for their valuable feedback and suggestions! 🤗🤗🤗