I. Preparation
pip install -r requirements.txt
python setup.py develop
run_slurm.py
II. Explanation of (i) Preliminaries
The authors used run_slurm.py
to run experiments.
The file contains six segments of code, with each segment headed by if False
. The six segments correspond to the following (this information is also commented in run_slurm.py
):
- (1) training vanilla models (RNNs and LSTMs)
- (2) evaluating vanilla model using regular decoding algorithms (for decoding algorithms, please refer to the decoding algorithm section below)
- (3) evaluating vanilla model using consistent sampling algorithms
- (4) training self-terminating RNN models
- (5) training self-terminating LSTM models
- (6) evaluating self-terminating RNN+LSTM models
Note that by default, we train each model using 10 different random seeds. The number of random seeds can be easily adjusted from run_slurm.py
.
In the end of the segments we show how to modify parameters to run with BPE-tokenized dataset..
run_slurm.py
(ii) Things to do before using - Set the values in
user_folders
per the example inrun_slurm.py
- Adjust
partition
choices inrun_slurm.py
, and adjust corresponding GPU options (to find the spot to do so, one can searchargs.partition
) - After training and before final evaluation, one should adjust
sweep_dirs
in the evaluation segments of the code, to refer to absolute locations of checkpoint folders; examples are included inrun_slurm.py
run_slurm.py
(iii) How to use To use one segment, users can set the corresponding if False
to if True
and run python run_slurm.py
.
Alternatively, if users are not in a slurm environment, or if users prefer to run our code through command line, one can print out the actual python commands by including the flag --print-commands
.
evaluate.py
III. Decoding algorithms in - When the model is a self-terminating RNN/LSTM,
evaluate.py
only uses greedy decoding algorithm. - When the model is a regular RNN/LSTM...
- if
--consistent-sampling 0
, thenevaluate.py
uses the following decoding algorithms: greedy decoding, ancestral sampling, beam search with beam size 2 and 4, top-k decoding with k=2 and k=4, and nucleus sampling with mu=0.2 and mu=0.4. - if
--consistent-sampling 1
, thenevaluate.py
uses the following decoding algorithms: consistent top-k decoding with k=2 and k=4, and consistent nucleus sampling with mu=0.2 and mu=0.4.
- if
IV. GPT-2 experiments
The self-terminating wrapper supports Transformers 3.3.1 (current version on Oct 2020)
The gpt2
folder contains all the necessary wrappers to use self-terminating layer in HuggingFace pretrained model.
- (1) Tokenize wikitext-103 dataset:
prepare-wikitext.py
- (2) Fine-tune GPT-2 or self-terminating GPT-2:
train_line.py