Training is very slow

Question

Training is very slow

kevaday opened this issue 6 years ago · comments

Hi Skuldur,

I am using your training script to train with my own midi files. I don't know why, but the training is going very slowly. It takes about 25 minutes to train only one epoch. I'm confused because i have a brand new GTX 1070 with drivers installed including geforce experience, cuda 9.0 (because tensorflow is incompatible with version 9.2 for some reason), and cudNN. With another training script (it's for generating text) it took only 1-2 minutes per epoch, but its model only had 1 LSTM so that might also be relatively slow.
How long did it take you to train with your midi files, with what GPU?
If you can help, it would be greatly appreciated.

Thanks,
Kevaday

Kevi Aday · Answer 1 · Tue Jul 31 2018 00:33:05 GMT+0800 (China Standard Time)

Also, I tried using your text generating LSTM but it's also very slow on my GPU, it takes about 1 hour to do one epoch. Shouldn't my GPU be a lot faster at training?

mehrzeller · Answer 2 · Fri Sep 14 2018 04:30:11 GMT+0800 (China Standard Time)

you can use CuDNNLSTM instead :
from keras.layers import CuDNNLSTM
instead of
from keras.layers import LSTM
don't forget to change all your LSTM statements in the source code

Kevi Aday · Answer 3 · Fri Sep 14 2018 06:12:23 GMT+0800 (China Standard Time)

Thanks mehrzeller,
actually, it's a coincidence because I was just looking at the keras recurrent layers site and just saw that there's such thing
but thanks anyways

victor-felicitas · Answer 4 · Mon Mar 15 2021 13:56:52 GMT+0800 (China Standard Time)

For new Keras versions specifying CuDNNLSTM is obsolete.
Training is slow because with recurrent_dropout >0 CUDNN can not be used.

From Keras documentation:
"The requirements to use the cuDNN implementation are:
activation == tanh
recurrent_activation == sigmoid
recurrent_dropout == 0
unroll is False
use_bias is True
Inputs, if use masking, are strictly right-padded.
Eager execution is enabled in the outermost context."

I suggest changing the model by adding dropout layer instead, after each LSTM layer