Why is the chain loss computation so slow?
bliunlpr opened this issue · comments
When I trained the model by ChainObjtiveFunction, I found the chain loss computation is very slow. For example, the data load time is 0.2s, the model forward time is 0.2s, but the loss computation time is 8.2s. I wonder why the chain loss computation is so slow and how to accelerate it? Thanks!
Do you enable CUDA like this when invoke ChainObjtiveFunction?
Thanks for pointing out this. I have not fully tested the chain function yet, as I'm working on a few other aspects currently. Currently, it only works with batchsize as 1. I think @glynpu made a good point. It is possible that I have missed this, and hence why it is slow now. To fully utilize the chain objective, I need to implement another dataloader that can prepare minibatches like Kaldi does, e.g., 128 sequences of 1.5 second of audio with supervisions. Currently I do not have time to do that yet, but will work on that as long as I get my hands free.
Hi @bliunlpr
I double checked by my setup, and it takes around 1s to compute the loss for each utterance. Is the 8.2s that you mentioned corresponds to the computation of one utterance?
I have added it, its in pull requests. @jzlianglu
Added CuDevice activation for LFMMI loss computation for significant speed improvement.