Why is the chain loss computation so slow?

Question

Why is the chain loss computation so slow?

bliunlpr opened this issue 6 years ago · comments

When I trained the model by ChainObjtiveFunction, I found the chain loss computation is very slow. For example, the data load time is 0.2s, the model forward time is 0.2s, but the loss computation time is 8.2s. I wonder why the chain loss computation is so slow and how to accelerate it? Thanks!

Liyong.Guo · Answer 1 · Wed Nov 06 2019 12:01:30 GMT+0800 (China Standard Time)

Do you enable CUDA like this when invoke ChainObjtiveFunction?

Liang Lu · Answer 2 · Wed Nov 06 2019 13:16:46 GMT+0800 (China Standard Time)

Thanks for pointing out this. I have not fully tested the chain function yet, as I'm working on a few other aspects currently. Currently, it only works with batchsize as 1. I think @glynpu made a good point. It is possible that I have missed this, and hence why it is slow now. To fully utilize the chain objective, I need to implement another dataloader that can prepare minibatches like Kaldi does, e.g., 128 sequences of 1.5 second of audio with supervisions. Currently I do not have time to do that yet, but will work on that as long as I get my hands free.

Liang Lu · Answer 3 · Fri Nov 08 2019 02:55:25 GMT+0800 (China Standard Time)

Hi @bliunlpr

I double checked by my setup, and it takes around 1s to compute the loss for each utterance. Is the 8.2s that you mentioned corresponds to the computation of one utterance?

bliunlpr · Answer 4 · Thu Nov 14 2019 10:09:51 GMT+0800 (China Standard Time)

Thanks for your reply. @glynpu made a good point. I have added the lines to enable CUDA. Now the loss computation time is about 0.5s. Thanks again! @glynpu jzlianglu

Liang Lu · Answer 5 · Fri Nov 15 2019 01:47:33 GMT+0800 (China Standard Time)

@bliunlpr , great, would you like to push your changes to the main branch?

bliunlpr · Answer 6 · Fri Nov 15 2019 11:04:03 GMT+0800 (China Standard Time)

I have added it, its in pull requests. @jzlianglu

Liang Lu · Answer 7 · Sat Nov 16 2019 03:33:14 GMT+0800 (China Standard Time)

@bliunlpr , thanks, will test it.

Liang Lu · Answer 8 · Tue Nov 19 2019 03:56:13 GMT+0800 (China Standard Time)

Added CuDevice activation for LFMMI loss computation for significant speed improvement.