Suggestions for improving dev-set performance.

Question

Suggestions for improving dev-set performance.

Feynman27 opened this issue 8 years ago · comments

(I apologize if this question is better suited for StackOverflow, but I figure posting it here will reach the right audience in a shorter amount of time.)

I'm training this CTC-cost model on the Librispeech "train-other-500" dataset, which contains 500 hours of speech audio+transcripts. I'm using the "dev-other" data set for development, which is apparently a more challenging audio set to model.

I trained the model over 20 epochs and have provided the distribution of the costs below.

The weights are updated according to Nesterov momentum.

Since the validation performance plateaus at around iter=25000, I decided to checkpoint the model here and continue running the model using an exponential learning-rate decay schedule. The learning rate is decreased after each epoch (starting from iter=25000). The CTC costs using this learning-rate decay schedule are shown below after a few epochs:

Unfortunately, this strategy doesn't appear to improve the model performance. Does anyone have any suggestions on how to improve the model other than what I've described above?

srvinay · Answer 1 · Thu Nov 03 2016 02:05:46 GMT+0800 (China Standard Time)

From the looks of it, your model seems to have high variance. You should try reducing the initial learning rate, add in regularization (dropout/augment with noise) or play with the model architecture if these ideas don't work.

Dylan Fox · Answer 2 · Wed Dec 07 2016 09:56:58 GMT+0800 (China Standard Time)

You can also try increasing the data you're training on. By default the max wav length is set to 10 seconds (https://github.com/baidu-research/ba-dls-deepspeech/blob/master/data_generator.py#L53-L54) which excludes a good portion of the data in the LibriSpeech corpora. Longer utterances most likely will require more memory usage though.