Adaxry / Post-Instruction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Post-Instruction

Contents

Overview

examples

Current instruct-following data generally put the task instruction before the input sentence (referred as "Pre-Ins") for sequqnce generation tasks (e.g., machine translation). We observe that LLMs may forget the frontmost task instruction when the input sentence is long, thus we propose to simply place the task instruction after the input sentence (referred as "Post-Ins"). Both our theoretical and experimental analyses show that Post-Ins pays larger attentions on the model's instruction-following capabilities, yielding consistent performance improvements across two common sequence generation tasks. For more details, please refer to our technical report.

wmt

Here are self-attention visualizations of models in both data formats, and we find that Pre-Ins mainly foucs on the source input, while Post-Ins pay more attentions on the specific task instruction. Here is an example script to plot the heatmap above.

wmt

As shown in above equations, "inp", "inst" and "res" are abbreviations for "source input", "instruction", and "response", respectively. We have observed that the post-instruction format naturally encourages the model to pay more attention to task instruction, while the pre-instruction format places more emphasis on modeling coverage.

Requirements

  • transformers>=4.28.0.dev0+
  • python>=3.8.0
  • torch>=1.10
  • deepspeed>=0.8.3+
  • datasets>=2.9.0+

Quick to Use

  • Organizing original data into Post-Ins format
    We provide processed training data used in our experiments at here, and you can also process your sentence pairs into Post-Ins format by the following script:

    sh scripts/organize_data.sh    # you can replace the file with yours in this script
  • Fine-tuning LLMs

    sh train/train_wmt.sh    # take the machine translation as an example
  • Testing

    sh test/test_wmt.sh   # take the machine translation as an example

Experiments

We provide all the model outputs for both the machine translation and text summarization task for an easy comparison. Below are partial results of the experiment:

wmt

Results on WMT22 for machine translation.

wmt

Results on CNN/DailyMail for long text summarization.

Citation

Please cite this paper if you find this repo useful.

@article{liu2023instruction,
  title={Instruction Position Matters in Sequence Generation with Large Language Models},
  author={Liu, Yijin and Zeng, Xianfeng and Meng, Fandong and Zhou, Jie},
  journal={arXiv preprint arXiv:2308.12097},
  year={2023}
}

Contact

Please feel free to contact us (yijinliu@tencent.com) for any further questions.

About


Languages

Language:Python 91.6%Language:Shell 8.4%