wenhuchen / HDSA-Dialog

Code and Data for ACL 2019 "Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HDSA-Dialog

This is the code and data for ACL 2019 long paper "Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention". The up-to-date version is in http://arxiv.org/abs/1905.12866.

The full architecture is displayed as below:

The architecture consists of two components:

  • Dialog act predictor (Fine-tuned BERT model)
  • Response generator (Hierarchical Disentangled Self-Attention Network)

The basic idea of the paper is to do enable controlled reponse generation under the Transformer framework, where we construct a dialog act graph to represent the semantic space in MultiWOZ tasks. Then we particularly specify different heads in different levels to a specific node in the dialog act graph. For example, the picture above demonstrates the merge of two dialog acts "hotel->inform->location" and "hotel->inform->name". The generated sentence is controlled to deliever message about the name and location of a recommended hotel.

Requirements

Please see the instructions to install the required packages before running experiments.

Folder

  • data: all the needed training/evaluation/testing data
  • transformer: all the baseline and proposed models, which include the hierarchical disentangled self-attention (class TableSemanticDecoder)
  • preprocessing: the code for pre-processing the database and original downloaded data

1. Dialog Act Predictor

This module is used to predict the next-step dialog acts based on the conversation history. Here we adopt the state-of-the-art NLU module BERT to get the best prediction accuracy. Make sure that you install the Pytorch-pretrained-BERT beforehand, which will automatically download pre-trained model into your tmp folder.

Download pre-trained models and the delex.json (it is needed for calculating the inform/request success rate)

sh collect_data.sh

Prepare data (optional, already in the github repo)

python preprocess_data_for_predictor.py

Training (if you use multiple GPU, the batch size can be enlarged)

rm -r checkpoints/predictor/
CUDA_VISIBLE_DEVICES=0 python3.5 train_predictor.py --do_train --do_eval --train_batch_size 6 --eval_batch_size 6

Testing (using the model saved at xxx step)

CUDA_VISIBLE_DEVICES=0 python3.5 train_predictor.py --do_eval --test_set dev --load_dir /tmp/output/save_step_xxx
CUDA_VISIBLE_DEVICES=0 python3.5 train_predictor.py --do_eval --test_set test --load_dir /tmp/output/save_step_xxx

The output values are saved in data/BERT_dev_prediction.json and data/BERT_dev_prediction.json, these two files need to be kept for the generator training.

2. Response Generator

This module is used to control the language generation based on the output of the pre-trained act predictor. The training data is already preprocessed and put in data/ folder (train.json, val.json and test.json).

Training

CUDA_VISIBLE_DEVICES=0 python3.5 train_generator.py --option train --model BERT_dim128_w_domain_exp --batch_size 512 --max_seq_length 50 --field

Delexicalized Testing (The entities are normalzied into placeholder like [restaurant_name])

CUDA_VISIBLE_DEVICES=0 python3.5 train_generator.py --option test --model BERT_dim128_w_domain_exp --batch_size 512 --max_seq_length 50 --field

Non-Delexicalized Testing (The entities need to be restored from the database record)

CUDA_VISIBLE_DEVICES=0 python3.5 train_generator.py --option postprocess --output_file /tmp/results.txt.pred.BERT_dim128_w_domain_exp.pred --model BERT --non_delex

3. Reproducibility

  • We release the pre-trained predictor model in checkpoints/predictor, you can put the zip file into checkpoints/predictor and unzip it to get the save_step_15120 folder.
CUDA_VISIBLE_DEVICES=0 python3.5 train_predictor.py --do_eval --test_set test --load_dir /tmp/output/save_step_15120
  • We already put the pre-trained generator model under checkpoints/generator, you can use this model to obtain 23.6 BLEU on the delexicalized test set.
CUDA_VISIBLE_DEVICES=0 python3.5 train_generator.py --option test --model BERT_dim128_w_domain --batch_size 512 --max_seq_length 50 --field
CUDA_VISIBLE_DEVICES=0 python3.5 train_generator.py --option postprocess --output_file /tmp/results.txt.pred.BERT_dim128_w_domain.pred --model BERT --non_delex

Acknowledgements

We sincerely thank University of Cambridge and PolyAI for releasing the dataset and code

About

Code and Data for ACL 2019 "Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention"

License:MIT License


Languages

Language:Python 99.7%Language:Shell 0.3%