Minimal wrapper that uses cuDNN to implement efficient RNNs in GPU. Very easy to use.
In what follows, some benchmarks are shown comparing this implementation with respect to TensorFlow using a GTX 1070 Ti GPU.
Comparison of memory used by TensorFlow with respect to this implementation, as a function of hiddenSize.
Speedup obtained for this implementation with respect to TensorFlow as a function of the sequence length seqLength
, for both LSTM and GRU cells:
Speedup obtained with respect to TensorFlow as a function of the number of hidden units hiddenSize:
Time per iteration in ms as a function of hiddenSize
for LSTM cells. Use static persistent kernels while possible:
The library is contained within the cudaRNN
namespace. The workflow is very straightforward and similar to TensorFlow.
- Initialize the structure
cudaRNN::RNNOptions_t
- Instantiate
cudaRNN::RNN
using the previous structure. This class is templatized with 2 arguments: the first one refers to the data type of the inputs and targets (int, float, or double
), and the second one to the data type of the weights (__half, float or double
). - Initialize inputs, which should be ordered as
[inLength, nSequences, inVecSize]
, and targets as[outLength, nSequences, inVecSize]
, by using the methodssetInputs
andsetTargets
. - Select an optimizer and a loss metric through
setOptimzer
andsetMetrics
(optional). - Call
train
.
Public variables of the structure RNNOptions_t
This structure contains the following enumerations: