Inference time is so slow.

Question

Inference time is so slow.

chazo1994 opened this issue 4 years ago · comments

chazo1994 commented 4 years ago

I used GPU RTX 2080TI to train and infer model. Training time is so quick, but inference time so slow (I run gen_wavernn.py from wav file ). I saw that after upsample, the number of frames increase hundred times. I know that WaveRNN is very quick in compared with others Neural Vocoder.

mindmapper15 · Answer 1 · Wed Jul 01 2020 11:16:24 GMT+0800 (China Standard Time)

Without batched generation, the inference speed of WaveRNN Vocoder is slow just like the any other WaveNet-based Neural Vocoders.

What makes WaveRNN generates audio fast is batched mode generation, which splits a single utterance into multiple segments to generate audio parallely.

To use batched mode generation for increasing audio generation speed, set voc_gen_batched=True in hparams.py

NOTE: batched mode generation is trade-off feature, if you set voc_target in hparams.py to smaller value, the generation speed will increase but the quality of generated audio goes worse.

chazo1994 · Answer 2 · Wed Jul 01 2020 11:33:51 GMT+0800 (China Standard Time)

Without batched generation, the inference speed of WaveRNN Vocoder is slow just like the any other WaveNet-based Neural Vocoders.

What makes WaveRNN generates audio fast is batched mode generation, which splits a single utterance into multiple segments to generate audio parallely.

To use batched mode generation for increasing audio generation speed, set voc_gen_batched=True in hparams.py

NOTE: batched mode generation is trade-off feature, if you set voc_target in hparams.py to smaller value, the generation speed will increase but the quality of generated audio goes worse.

I already use batch processing, but it still so slow.

mindmapper15 · Answer 3 · Fri Jul 03 2020 08:55:49 GMT+0800 (China Standard Time)

I already use batch processing, but it still so slow.

I don't know how much speed you expect but decreasing voc_target in hparams.py would help to speed up inference.
But the quality of synthesized audio will become worse.