What params should I tweak in order to prevent the model crashing during training?
AliBharwani opened this issue · comments
AliBharwani commented
I'm trying to train this model on an EC2 instance running a c5.xlarge (4 vCPUs, 8gb of RAM). After setting up the data and preprocessing, I try to train the Tacotron-2 model. It gets as far as printing "Generated 20 test batches of size 32 in 23.134 sec" and then hangs. At this point I've tried sshing from a different terminal but that always freezes, and eventually the training terminal prints "Killed". I'm guessing it's probably bc of the limited resources on the machine. Is there anyway to get around this?
Antoni Mur commented
@AliBharwani Use GPU, with more RAM, for example g2.2xlarge
Debasish Modak commented
reduce batch_size for tacotron training in hparams=16 from 32