speed of generating speech samples

Question

speed of generating speech samples

dengyan opened this issue 7 years ago · comments

I found that SampleRNN need to be run in parallel to get fast generation speed. It takes only about 500 seconds for generating 200 utterances, each with a length of 8 seconds speech. But it will be very time costing if only run one sentence in generation, more than 40 seconds for 1 second speech. It seems it's not faster than Wavenet. Does anyone have some ideas on speeding up it?

CJ Carr · Answer 1 · Fri Dec 08 2017 07:11:36 GMT+0800 (China Standard Time)

Using a p3x16large AWS instance
NVIDIA Tesla V100
CUDA 9

This appears to run 10x the speed of dengyan's setup.

It takes us 1000 seconds to generate 4 minute audio files.

If we generate 100 of these in parallel
that's 24 seconds of generative audio for every 1 second of processing

If we generate 1 of these:
That's 0.24 seconds of generative audio for every 1 second of processing