Any info for tweaking training settings for those with little background in LSTMs?

Question

Any info for tweaking training settings for those with little background in LSTMs?

broccolus opened this issue 7 years ago · comments

Hi All,

Not sure if this is the right place to post this, but I'm looking for a little extra info on how to choose parameters for model training. I have very little background in anything to do with neural networks or even any programming skills. I have been curious to try a little experiment as I find this software fascinating. I am tech-savvy enough that I have installed everything successfully and started training using the default settings. My data set is about 3,000,000 characters. Right now it seems I have reached a point of diminishing returns - the model is consistently underfitting and the loss value doesn't seem to be changing much at each checkpoint. By underfitting I mean that there are consistently many gibberish words and erratic sentence structures despite the structured nature of the data set. A few questions:

How many epochs would training a model generally require to produce effective results? I made it to about 13/50 and it goes quite slow (cpu mode on a crap computer, this point took me >48hrs of constant running). Am I just being impatient? Could the loss value start to change again even after a perceived plateau? Is loss the be-all-end-all of evaluating a training run, or could the model be still improving even if the loss value doesn't change?
If I am faced with underfitting, which model parameters should I change first to improve it? -rnn_size, -num_layers, -batch_size, something else?
Does anybody have any resources designed for beginners that help explain the theory behind neural networks to help me understand exactly what is going on so I can improve my understanding and answer these questions myself?

Thanks all
J

antihutka · Answer 1 · Fri Jun 02 2017 08:50:40 GMT+0800 (China Standard Time)

I'd try increasing rnn_size and/or num_layers first. Also make sure loss doesn't drop every lr_decay_every epochs.
CPU training shouldn't take that long for a small network, there are some tricks that might help:

Make sure Torch is built with OpenBLAS
Adjust number of threads using OMP_NUM_THREADS
Increase or decrease rnn_size and wordvec_size by 1 or 2. Odd sizes seem to be significantly faster when using CPU.
Try the optimized LSTM implementation in #187

broccolus · Answer 2 · Fri Jun 02 2017 08:53:59 GMT+0800 (China Standard Time)

Fantastic info! I'll set up another run tonight with these considerations.