- An updated version of the repositories pytorch-qrnn and awd-lstm-lm to PyTorch v1.4.0
- Custom reimplementation of the QRNN model and experiments in from Merity et al. using Allennlp
- Integration of the QRNN Encoder-Decoder variant into JoeyNMT (Kreuzer et al.)
- An attempt to implement the cold-fusion method proposed in Sriram et al.
Furthermore, all data needed for the different scripts are provided in this repository except for GloVe embeddings.
- Python 3.7 (or compatible)
- PyTorch 1.4.0 (or compatible)
- CuPy 7.5.0 (or compatible)
CuPy doesn't seem to work without CUDA setup, so if you want to run the code in a CPU-only environment, you may want to not import CuPy.
You can use the code as suggested in the original repository except the generate.py
, pointer.py
and finetune.py
scripts are removed. See the README file in the awd-lstm-lm
folder for details.
I have moved the QRNN scripts into the awd-lstm-lm
folder, so there is no need to install any (other) package (in the original implementation, the QRNN and AWD-LSTM are different packages).
- Generally, the Salesforce implementation is not complete. Especially, the
generate.py
,pointer.py
andfinetune.py
scripts are dysfunctional in the original repository (as stated in the README), so I removed them.
If you care about the details:RNNModel
is programmed to return contextual embeddings (activations of the last hidden layer). This works with theSplitCrossEntropy-loss
employed inmain.py
, because it has a separate linear layer to produce prediction scores for the vocabulary. Contrarily, the removed scripts expect the output of the model to be prediction scores for the vocabulary.- As far as I can tell,
finetune.py
doesn't do anything different (except fot not working) than the main training procedure, so I don't see any reason to use it. Also, because of lacking documentation, I am not sure whatpointer.py
is meant to do.
- There are a few known minor bugs which I fixed also in this code
- The QRNN doesn't train with the proposed hyperparameters. Changing the learning rate helps with the problem, but the general results seem not to be much different from the results that I achieve with my custom implementation (ca. 110 PPL). This is far worse than reported in the paper and in the Salesforce repository README.
- General structure of
QRNN
andQRNNLayer
- Cuda-Kernel for
f-pooling
- Design principle: Can be used as a drop-in replacement for any PyTorch RNN variant
- New: Arbitrary convolutional kernel sizes (the Salesforce implementation only supports kernel widths 1 and 2)
- New: Bidirectional mode
- New: Cuda-Kernel for
ifo-pooling
- New: Implementation of the QRNN-Decoder variant (the Salesforce implementation can only be used for language modelling). This includes the attention scheme proposed in the paper.
- Improved: Uses PyTorch's built-in 1d-Convolution (instead of a custom operation)
- Improved: Doesn't have to explicitely reset the cached previous inputs (like in the Salesforce implementation)
- New: Integration into Allennlp and JoeyNMT
You need:
- Python 3.7 (or compatible)
- PyTorch 1.4.0 (or compatible)
- CuPy 7.5.0 (or compatible)
- Allennlp 0.9.0 (be sure not to use version 1.x, as this is incompatible)
- gensim 3.8.0 (or compatible)
- any requirements of the above mentioned libraries
You may want to download the GloVe vectors for review classification by running the download_data.sh
script.
You can just run:
python review_classification.py
python language_modelling.py
Be sure that the local saved_models/lm
and saved_models/review_classification
directories exist and are empty. If they are not empty, allennlp
will try to load checkpoint models from the directories and not train at all.
For changing hyperparameters, you have to edit the respective python scripts, but the parameters themselves should be self-explanatory.
You need:
- Python 3.7 (or compatible)
- PyTorch 1.4.0 (or compatible)
- CuPy 7.5.0 (or compatible)
- the JoeyNMT version provided in this repository. You can install it just like the regular JoeyNMT package.
- all requirements of JoeyNMT
You can just run the provided yaml
-configuration file as one would normally run JoeyNMT, e.g.
python -u -m joeynmt train configs/qrnn.yaml
Generally, usage is the same as for the normal JoeyNMT distribution, but the configuration options are restricted to the use of QRNNs.
This implementation is experimental, so it works, but does not yield any configuration options. Also, the translation model doesn't seem to train well, so there may be some shortcomings.
You need:
- Python 3.7 (or compatible)
- PyTorch 1.4.0 (or compatible)
- CuPy 7.5.0 (or compatible)
- Allennlp 0.9.0 (be sure not to use version 1.x, as this is incompatible)
- gensim 3.8.0 (or compatible)
Just run python cold_fusion.py
.