Post-Instruction

Overview
Requirements
Quick to Use
Experiments
Citation
Contact

Overview

Current instruct-following data generally put the task instruction before the input sentence (referred as "Pre-Ins") for sequqnce generation tasks (e.g., machine translation). We observe that LLMs may forget the frontmost task instruction when the input sentence is long, thus we propose to simply place the task instruction after the input sentence (referred as "Post-Ins"). Both our theoretical and experimental analyses show that Post-Ins pays larger attentions on the model's instruction-following capabilities, yielding consistent performance improvements across two common sequence generation tasks. For more details, please refer to our technical report.

Here are self-attention visualizations of models in both data formats, and we find that Pre-Ins mainly foucs on the source input, while Post-Ins pay more attentions on the specific task instruction. Here is an example script to plot the heatmap above.

As shown in above equations, "inp", "inst" and "res" are abbreviations for "source input", "instruction", and "response", respectively. We have observed that the post-instruction format naturally encourages the model to pay more attention to task instruction, while the pre-instruction format places more emphasis on modeling coverage.

Requirements

transformers>=4.28.0.dev0+
python>=3.8.0
torch>=1.10
deepspeed>=0.8.3+
datasets>=2.9.0+

Quick to Use

Organizing original data into Post-Ins format
We provide processed training data used in our experiments at here, and you can also process your sentence pairs into Post-Ins format by the following script:
```
sh scripts/organize_data.sh    # you can replace the file with yours in this script
```

Fine-tuning LLMs

sh train/train_wmt.sh    # take the machine translation as an example

Testing

sh test/test_wmt.sh   # take the machine translation as an example

Experiments

We provide all the model outputs for both the machine translation and text summarization task for an easy comparison. Below are partial results of the experiment:

Results on WMT22 for machine translation.

Results on CNN/DailyMail for long text summarization.

Citation

Please cite this paper if you find this repo useful.

@article{liu2023instruction,
  title={Instruction Position Matters in Sequence Generation with Large Language Models},
  author={Liu, Yijin and Zeng, Xianfeng and Meng, Fandong and Zhou, Jie},
  journal={arXiv preprint arXiv:2308.12097},
  year={2023}
}

Contact

Please feel free to contact us (yijinliu@tencent.com) for any further questions.

Adaxry / Post-Instruction