cudaRNN

Minimal wrapper that uses cuDNN to implement efficient RNNs in GPU. Very easy to use.

Benchmarks

In what follows, some benchmarks are shown comparing this implementation with respect to TensorFlow using a GTX 1070 Ti GPU.

Comparison of memory used by TensorFlow with respect to this implementation, as a function of hiddenSize.

Speedup obtained for this implementation with respect to TensorFlow as a function of the sequence length seqLength, for both LSTM and GRU cells:

Speedup obtained with respect to TensorFlow as a function of the number of hidden units hiddenSize:

Time per iteration in ms as a function of hiddenSize for LSTM cells. Use static persistent kernels while possible:

The library is contained within the cudaRNN namespace. The workflow is very straightforward and similar to TensorFlow.

Initialize the structure cudaRNN::RNNOptions_t
Instantiate cudaRNN::RNN using the previous structure. This class is templatized with 2 arguments: the first one refers to the data type of the inputs and targets (int, float, or double), and the second one to the data type of the weights (__half, float or double).
Initialize inputs, which should be ordered as [inLength, nSequences, inVecSize], and targets as [outLength, nSequences, inVecSize], by using the methods setInputs and setTargets.
Select an optimizer and a loss metric through setOptimzer and setMetrics (optional).
Call train.

Public variables of the structure RNNOptions_t

This structure contains the following enumerations: