- [24/Feb/2024] 🎉 Our paper is accepted by LREC-COLING 2024 (The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation)!
Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we present a simple yet effective Contrastive Learning Framework for Human Alignment (CLHA) to align LLMs with human preferences directly. CLHA employs a novel rescoring strategy to evaluate the noise within the data by considering its inherent quality and dynamically adjusting the training process. Simultaneously, CLHA utilizes pairwise contrastive loss and adaptive supervised fine-tuning loss to adaptively modify the likelihood of generating responses, ensuring enhanced alignment with human preferences. Using advanced methods, CLHA surpasses other algorithms, showcasing superior performance in terms of reward model scores, automatic evaluations, and human assessments on the widely used “Helpful and Harmless” dataset.
We provide the preprocessed data for training and testing, which can be get with following steps:
- Download data.zip and unzip it.
- Place the unzipped
data
folder in the root directory of the project.
Besides, we also provide the scripts for preprocessing the raw data. Please follow the steps below to prepare the data:
- Create a directory named
data
in the root directory of this project. - Create a directory named
data/raw_data
in thedata
directory. - Download the raw data from HH-RLHF, which should be named as
hhrlhf
, and put it in thedata/raw_data
directory. - Run the following command to preprocess the data:
# For HH-RLHF
cd train/hh_preprocess_data
python step_1_process.py
python step_2_get_train_data.py
python step_3_get_test_data.py
We provide the training scripts for training the model. For example, you can run the following commands to train the model:
mkdir checkpoints
mkdir logs
#Download your reward models from https://huggingface.co/OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5 and https://huggingface.co/OpenAssistant/oasst-rm-2-pythia-6.9b-epoch-1
mkdir rm
cd train
# Train LLMs with HH-RLHF
./train_hh.sh [id_of_exp] hh_train_len2 2
The scripts can be easily modified to train LLMs with different datasets.
The following command can be used to test the model:
# Test LLMs with HH-RLHF
cd eval_hh
./run_infer_main_dist.sh
Note: Before running, the
id_of_exp
and corresponding ranking length (during training) inrun_infer_main_dist.sh
have to be specified.
This project was inspired by PRO [DAMO-ConvAI]. We appreciate the original work done by the author.
If this work is helpful to you, welcome to cite our paper as:
@article{fang2024clha,
title={CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment},
author={Fang, Feiteng and Zhu, Liang and Yang, Min and Feng, Xi and Hou, Jinchang and Zhao, Qixuan and Li, Chengming and Hu, Xiping and Xu, Ruifeng},
journal={arXiv preprint arXiv:2403.16649},
year={2024}
}