This is the accompanying code for
DiPietro, Robert, Nassir Navab, and Gregory D. Hager. "Revisiting NARX Recurrent Neural Networks for Long-Term Dependencies." arXiv preprint arXiv:1702.07805 (2017). https://arxiv.org/abs/1702.07805
We ask that you cite the paper if you find the code useful in your research.
-
RNNs struggle with learning long-term dependencies because of the vanishing gradient problem, which can't be resolved in the absolute (see Bengio et al., 1994).
-
LSTM and GRUs use one specific mechanism to help alleviate this problem. NARX RNNs take an entirely different approach, by including direct connections to the past.
-
We analyze the vanishing gradient problem for NARX RNNs in detail, and based on this analysis introduce a new variant which we call MIxed hiSTory RNNs (MIST RNNs).
-
We compare simple RNNs, LSTM, GRUs, and MIST RNNs across 4 diverse tasks. MIST RNNs significantly outperform LSTM and GRUs in 2 cases and match performance in the other 2 cases.
See layers.py
and models.py
. Here you'll find
from-scratch implementations of simple RNNs, LSTM, GRUs, and MIST RNNs.
(Note that our LSTM implementation matches results in prior work, and our GRU implementation improves these results further.)
Why from scratch?
MIST RNNs don't fit the typical RNNCell
, dynamic_rnn
approach because they
depend on many states from the past. Though this is only true for MIST RNNs, we
prefer unified code that handles all experiments.
Also, for research we favor concise, somewhat special-purpose code over bulky, general-purpose code. For example, models.py is about 100 lines of code without comments, and it handles classification, regression, and multi-label classification RNNs, in all cases handling both sequence to sequence mappings and sequence to value mappings. Compare this to TensorFlow Learn's dynamic_rnn_estimator.py or Keras's recurrent.py.
That said, we do hope to provide an RNNCell
in the future; it should be able
to work with TensorFlow's raw_rnn
, which right now is in an early testing
phase with an API that's not yet stable.
See copyproblem.py
, additionproblem.py
, timit.py
, timitphonemerec.py
,
and mnist.py
.
All of these files are executables (for downloading / generating /
preprocessing), and all except timit.py
are also modules which make it
easy to load train, val, test splits.
Example:
(python3.5)rdipiet2@thin6 mist-rnns $ python3 mnist.py -h
usage: mnist.py [-h] [--data_dir DATA_DIR]
Download MNIST.
optional arguments:
...
All arguments have defaults and are therefore optional. Here, data_dir
defaults to ~/Data/MNIST
.
After running mnist.py
to download the data, we can load it with (for
example)
import mnist
outs = mnist.load_split(val=True, permute=True, normalize=True, num_val=2000)
train_images, train_labels, val_images, val_labels = outs
copyproblem.py
, additionproblem.py
, mnist.py
give immediate access to data
for the copy problem, addition problem, and MNIST tasks.
Unfortunately we can't give immediate access to TIMIT because it's not freely
available. Instead, timit.py
processes the NIST Speech Disc CD1-1.1
release
to form the standard train, val, test sets (see paper or code for details).
timitphonemerec.py
then processes this data further, producing MFCC
coefficients etc. Finally timitphonemerec
provides the same load_split
functionality as elsewhere.
See copyproblem_train.py
, additionproblem_train.py
,
timitphonemerec_train.py
, and mnist_train.py
. Each is an executable for
training and exporting summaries / results on the train and val sets.
Example:
(python3.5)rdipiet2@thin6 mist-rnns $ python3 mnist_train.py -h
usage: mnist_train.py [-h] [--data_dir DATA_DIR] [--debug DEBUG]
[--permute PERMUTE] [--layer_type LAYER_TYPE]
[--activation_type ACTIVATION_TYPE]
[--num_hidden_units NUM_HIDDEN_UNITS]
[--optimizer OPTIMIZER] [--learning_rate LEARNING_RATE]
[--optional_bias_shift OPTIONAL_BIAS_SHIFT]
[--num_pre_act_mixture_delays NUM_PRE_ACT_MIXTURE_DELAYS]
[--trial TRIAL]
Train an RNN for sequential (possibly permuted) MNIST recognition.
optional arguments:
...
Again, all arguments have defaults and are therefore optional. layer_type
can
be any layer from layers.py
: SimpleLayer
, LSTMLayer
, GRULayer
, or
MISTLayer
. Similarly, optimizer
can be any optimizer from optimizers.py
,
which right now includes ClippingGradientDescentOptimizer
and
ClippingMomentumOptimizer
, where in both cases 'clipping' really refers to
thresholded scaling (see Pascanu et al., 2013)
All TensorBoard summaries / models / other status files are written to
DATA_DIR/results/dir_with_params_in_the_name
. Each run is then contained in
its own directory and can be analyzed together by navigating to
DATA_DIR/results
with TensorBoard.
Important note: Do not run training with the default learning rate and expect to see reasonable results. All models are sensitive to learning rate; see the paper for good choices for various task, method pairs.
See batch_runs/
, which includes copyproblem_template.sh
,
additionproblem_template.sh
, timitphonemerec_template.sh
, and
mnist_template.sh
. By default, each run uses a learning rate sampled randomly
in log space over 10^-4
and 10^1
.
See timitphonemerec_test.py
and mnist_test.py
.
Example:
(python3.5)rdipiet2@thin6 mist-rnns $ python3 mnist_test.py -h
usage: mnist_test.py [-h] [--data_dir DATA_DIR] [--results_dir RESULTS_DIR]
Test an RNN for sequential (possibly permuted) MNIST recognition.
optional arguments:
...
Here, results_dir
is required, and must must be one of the directories
created during training, which contains TensorBoard summaries, a saved model,
etc.
See utils.py
. This contains various helper functions to traverse examples
epoch by epoch, to pad sequences, to form batches for full BPTT or truncated
BPTT, etc.
You may see slow performance if you use CPUs rather than GPUs. In particular, TensorFlow for some reason leads to ~40% CPU utilization, and a trace shows that the bottleneck is element-wise multiplication. If you resolve this issue with compilation options, or if you implement MIST RNNs in Theano, Torch, etc., please let us know.
We used Nvidia K80s for all experiments.
Python 3 is required.
This code was upgraded to be compatible with the official TensorFlow 1.0
release, and was tested with this same version. It can be obtained via pip install tensorflow-gpu==1.0.0
.
Also, if you want to extract MFCC coefficients etc. for TIMIT, you'll need
python-speech-features
. We used version 0.4, which you can get via
pip install python-speech-features==0.4
.