jcjohnson / torch-rnn

Efficient, reusable RNNs and LSTMs for torch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Any info for tweaking training settings for those with little background in LSTMs?

broccolus opened this issue · comments

Hi All,

Not sure if this is the right place to post this, but I'm looking for a little extra info on how to choose parameters for model training. I have very little background in anything to do with neural networks or even any programming skills. I have been curious to try a little experiment as I find this software fascinating. I am tech-savvy enough that I have installed everything successfully and started training using the default settings. My data set is about 3,000,000 characters. Right now it seems I have reached a point of diminishing returns - the model is consistently underfitting and the loss value doesn't seem to be changing much at each checkpoint. By underfitting I mean that there are consistently many gibberish words and erratic sentence structures despite the structured nature of the data set. A few questions:

  1. How many epochs would training a model generally require to produce effective results? I made it to about 13/50 and it goes quite slow (cpu mode on a crap computer, this point took me >48hrs of constant running). Am I just being impatient? Could the loss value start to change again even after a perceived plateau? Is loss the be-all-end-all of evaluating a training run, or could the model be still improving even if the loss value doesn't change?

  2. If I am faced with underfitting, which model parameters should I change first to improve it? -rnn_size, -num_layers, -batch_size, something else?

  3. Does anybody have any resources designed for beginners that help explain the theory behind neural networks to help me understand exactly what is going on so I can improve my understanding and answer these questions myself?

Thanks all
J

I'd try increasing rnn_size and/or num_layers first. Also make sure loss doesn't drop every lr_decay_every epochs.
CPU training shouldn't take that long for a small network, there are some tricks that might help:

  • Make sure Torch is built with OpenBLAS
  • Adjust number of threads using OMP_NUM_THREADS
  • Increase or decrease rnn_size and wordvec_size by 1 or 2. Odd sizes seem to be significantly faster when using CPU.
  • Try the optimized LSTM implementation in #187

Fantastic info! I'll set up another run tonight with these considerations.