soroushmehr / sampleRNN_ICLR2017

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

Home Page:https://arxiv.org/abs/1612.07837

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

speed of generating speech samples

dengyan opened this issue · comments

I found that SampleRNN need to be run in parallel to get fast generation speed. It takes only about 500 seconds for generating 200 utterances, each with a length of 8 seconds speech. But it will be very time costing if only run one sentence in generation, more than 40 seconds for 1 second speech. It seems it's not faster than Wavenet. Does anyone have some ideas on speeding up it?

Using a p3x16large AWS instance
NVIDIA Tesla V100
CUDA 9

This appears to run 10x the speed of dengyan's setup.

It takes us 1000 seconds to generate 4 minute audio files.

If we generate 100 of these in parallel
that's 24 seconds of generative audio for every 1 second of processing

If we generate 1 of these:
That's 0.24 seconds of generative audio for every 1 second of processing