zhangjiekui / dialogue-generation

:speech_balloon: General purpose conversational agent with pretrained XLNet and GPT-2 in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dialogue generation

Currently under development. Please feel free to contribute with either pull requests or by filling an issue.

Implementation of a neural dialogue generator model with pretrained XLNet Yang et al. (2019) and GPT2 architecture Radford et al. (2019) on daily dialog dataset Li et al. (2017) (additional datasets are coming soon). Top-k sampling Fan et al. (2018) and nucleus decoding Holtzman et al. (2019) are available as decoding techniques.

Usage

The model uses mixed precision training from nvidia/apex. Note that apex is not required and is only used if it is available. For installation guide of this module see the official instructions.

To train the model clone this repository and install dependecies. The project uses cython to assemble batches for faster input pipeline.

pip install -r requirements.txt

python setup.py build_ext --inplace

The model can be trained with the following commands. Note that <data_dir> and <model_dir> are optional, as they are provided by default but you can also customize the location of model and data directories with those arguments.

python -m src.train --model MODEL

For distributed multi-gpu training the train script should be called like this.

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS src/train.py --model MODEL

Available models are xlnet-base-cased, xlnet-large-cased, and gpt2, gpt2-medium, gpt2-large. Currently the only available dataset option is dailydialog. An interactive evaluation mode is available on the trained model by running the interact script.

python -m src.interact --model MODEL

Training the model is fast and easy on Google Colaboratory, which can be done from scratch by creating a new colab file in your Google Drive and running it with the following snippet. It is important to set the runtime type to GPU with a Tesla T4 unit as it can fully leverage mixed-precision training and is much faster than the older K80 version. You can check the current type by running the following line in a cell of your colab.

!nvidia-smi

Copy and run the following code in a cell of your colab file for installing the model.

!git clone https://github.com/bme-chatbots/dialogue-generation.git
!python -m pip install --upgrade pip

# installing apex
!git clone https://github.com/NVIDIA/apex
!cd apex; pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

# building the cython code
!cd dialogue-generation; python setup.py build_ext --inplace

# installing the required packages
!cd dialogue-generation; pip install -r requirements.txt

The training loss and accuracy is logged with TensorboardX, which can also be tracked in the colab file if the below code is run before the training cell.

%load_ext tensorboard
%tensorboard --logdir "dialogue-generation/model"

The model can be trained then by simply running the run.sh script with the default parameters.

!cd dialogue-generation; python -m src.train

Results

These results are from xlnet-base-cased trained with default parameters for 14 epochs and --max_hist 1 as training with longer context currently creates worse models ( actively looking for solution ).

what is your name ?
Susan Ann.


how old are you ?
about three and a half, right?


where do you live ?
in California.


what is your favourite colour ?
I like red, both in both of them!


do you have a favourite movie ?
sure. It was quite popular among children.


what is the title of this movie ?
it's called'Is it a new film '


you are not funny
I am not joking, but you deserve to change your mind.

About

:speech_balloon: General purpose conversational agent with pretrained XLNet and GPT-2 in PyTorch.

License:MIT License


Languages

Language:Python 100.0%