BiSon: Bidirectional Sequence Generation

This repository contains code for bidirectional sequence generation (BiSon).

Results have been published in (please cite if you use this repository):

Carolin Lawrence, Bhushan Kotnis, and Mathias Niepert. 2019. Attending to Future Tokens For Bidirectional Sequence Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China.

Content

Section	Description
Installation	How to install the package
Overview	Overview of the package
Implementing a new dataset
General Notes	Additional useful information

Installation

External libraries are: numpy, torch>=0.4.1, tqdm, boto3, requests, regex

This repository is compatible with the following three projects, which are required to reproduce the results reported in the paper:

Huggingface's BERT models: Should be compatible with any instance of the pyotch-pretrained-bert version, results from the paper have been produced using a fork of commit hash a5b3a89545bfce466dd977a9c6a7b15554b193b1
BLEU Script: To get BLEU evaluation scores, the script needs to be downloaded and placed in bison/evals/multi-bleu.perl and have execution rights (e.g. chmod u+x bison/evals/multi-bleu.perl)
Sharc Evaluation Script: The official sharc evaluation script needs to be downloaded and placed in bison/evals/evaluator_sharc.py. It can be found in the codalab of the Sharc dataset: https://worksheets.codalab.org/worksheets/0xcd87fe339fa2493aac9396a3a27bbae8/, search for "evaluator.py".

Overview

Example files to call BiSon for either training or prediction for two datasets (ShARC and Daily Dialog) can be found in example_files. Be sure to adjust the variable REPO_DIR to point to the path of your repository.
BiSon specific implementations:
- arguments.py: Specifies all possible arguments for both BiSon:
  - GeneralArguments: General settings.
  - BisonArguments: BiSon specific settings.
- bison_handler.py: Calls all necessary functions for BiSon training and prediction.
- masking.py: Handles the masking procedure. Get a masker by calling get_masker and passing BisonArguments. Currently one masker is implemented:
  - GenerationMasking.py: Places masks in Part B, where Part A is conditioning input and Part B will be just placeholder tokens ([MASK])) at prediction time. Masks can either be placed using a Bernoulli distribution (--masking_strategy bernoulli) with a specified mean (--distribution_mean) or using a Gaussian distribution (--masking_strategy gaussian) with a specified mean (--distribution_mean) and standard deviation (--distribution_stdev)
- model_helper.py: Sets up some general BiSon settings.
- predict.py: Handles BiSon prediction.
- train.py: Handles BiSon training.
- util.py: Some utility function, e.g. for reading and writing files.
Several implemented datasets. Get a data handler by calling get_data_handler from datasets_factory.py and passing BisonArguments.

The general class that all other datasets should inherit from:
- datasets_bitext.py: Implements all necessary functions a data handler should have. It assumes a tab separate files as input, where everything prior to the tab becomes Part A and everything after the tab becomes Part B. At prediction time, BiSon aims to predict Part B.
Dialogue datasets:
- datasets_sharc.py: Implements the ShARC dataset.
- datasets_daily.py: Implements the Daily Dialog dataset.
main python file:
- run_bison.py: Main entry point for any BiSon training and prediction.

Implementing a new dataset

To implement a new dataset, ensure that it inherits from BitextHandler. See the documentation of each function and determine if your dataset needs to overwrite this functionality or not.

General Notes

The learning rate depends on the number of epochs, see warmup_linear function in optimizer.py: At the last update step the learning rate is 0. If a run finishes with its highest score in the last epoch, increasing the epoch counter does not necessarily help because it completely modifies the learning rate.
Training cannot simply be restarted from a saved model because Adam's parameters are not saved.
When using the parameter --gradient_accumulation_steps, the value for batch_size should be the truly desired batch size, e.g. we want a batch size of 16 but only 6 examples fit into GPU RAM, then:

--train_batch_size 16 --gradient_accumulation_steps 3

carolinlawrence / BiSon