DialogVED

Code and released pre-trained model for our ACL 2022 paper: DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation.

News

Fixed bugs in dailydialog, updated new training and evaluation scripts. (2022.06.19)
Optimize code structure and remove redundant code. (2022.05.29)
Pretrained checkpoints of DialogVED have been released! (2022.05.17)

TODO

A fp16 version of DialogVED will be released, about 700M in size.
Pre-trained scripts are scheduled to be released.

Requirements

python==3.7
torch==1.3.0
fairseq==0.9.0
tensorboardX==1.7
pytorch_transformers
sklearn
nltk==3.5

sudo apt install default-jdk
curl https://install.meteor.com/ | sh

pip install -r requirements.txt

Pre-trained Models

We have released the following checkpoints for pre-trained models as described in the paper of DialogVED. Download the pre-trained checkpoint and set the load-from-pretrained-model parameter in the fine-tuning running command.

Note: DialogVED-VAE-Standard has a size of latent size 32, where DialogVED-VAE-Large has a size of latent size 64. DialogVED-Seq2Seq has no latent variable, it's a pure seq2seq model with the same training setting like DialogVED. It may perform better in scenarios where diversity of responses is less important.

Fine-tuning

Data preparation

We finetune DialogVED on three datasets DailyDialog, PersonaChat and DSTC7AVSD. You can download them according to the instructions in PLATO, or run our script as follows.

bash preprocess/get_data.sh

Preprocess

bash preprocess/process.sh

Binarization

bash preprocess/binarize.sh

Training

the script train.sh has three parameters, namely p, t and d.

p: pretrained model path
t: pretrained model type (dialogved_standard, dialogved_large or dialogved_seq2seq)
d: fine-tuned dataset (dailydialog, personachat or dstc7avsd)

bash train.sh -p /remote-home/models/dialogved_standard.pt -t dialogved_standard -d dailydialog

Inference

the script infer.sh has two parameters, namely d and s.

d: fine-tuned dataset (dailydialog, personachat or dstc7avsd)
s: decoding strategy (greedy, beam or sampling)

bash infer.sh -d dailydialog -s beam

Evaluation

the script eval.sh has one parameter, namely d.

d: fine-tuned dataset (dailydialog, personachat or dstc7avsd)

bash eval.sh -d dailydialog

How to Cite

If you extend or use this work, please cite the paper where it was introduced:

@inproceedings{chen-etal-2022-dialogved,
    title = "{DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation",
    author = "Chen, Wei and Gong, Yeyun and Wang, Song and Yao, Bolun and Qi, Weizhen and Wei, Zhongyu and Hu, Xiaowu and Zhou, Bartuer and Mao, Yi and Chen, Weizhu and Cheng, Biao and Duan, Nan",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.333",
    doi = "10.18653/v1/2022.acl-long.333",
    pages = "4852--4864",
    abstract = "Dialog response generation in open domain is an important research topic where the main challenge is to generate relevant and diverse responses. In this paper, we propose a new dialog pre-training framework called DialogVED, which introduces continuous latent variables into the enhanced encoder-decoder pre-training framework to increase the relevance and diversity of responses. With the help of a large dialog corpus (Reddit), we pre-train the model using the following 4 tasks, used in training language models (LMs) and Variational Autoencoders (VAEs) literature: 1) masked language model; 2) response generation; 3) bag-of-words prediction; and 4) KL divergence reduction. We also add additional parameters to model the turn structure in dialogs to improve the performance of the pre-trained model. We conduct experiments on PersonaChat, DailyDialog, and DSTC7-AVSD benchmarks for response generation. Experimental results show that our model achieves the new state-of-the-art results on all these datasets.",
}

zhao9797 / DialogVED