Speed up quick_start.py by running it with GPU

Question

Speed up quick_start.py by running it with GPU

gerbill opened this issue 4 years ago · comments

Hello!
I've been able to successfully generate audio files with quick_start.py, but using CPU is pretty slow. If I use CUDA it's still not much faster and uses my GPU for about 5%. I am assuming I could speed up audio files generation by at least 10 times.
Is there an easy solution for this?
My experience in RNN or ML is pretty low :(
My GPU is GeForce RTX 2060 if that helps.
Thank you!

zirlman · Answer 1 · Fri May 22 2020 01:32:36 GMT+0800 (China Standard Time)

Facing the same problem. RNNs can't be parallelized because of their sequential architecture, therefore using a GPU won't increase inference time.

Mike · Answer 2 · Fri May 22 2020 02:10:51 GMT+0800 (China Standard Time)

@zirlman , I see that when I'm using CPU, it's 100% loaded, all 6 cores, so there is some degree of parallelization going on I suppose..

Btw, one more question (sorry for the offtopic), but I've noticed that when I generate audio files the speech quality is much lower than what I hear in samples, even if I use the very same text. Could you please advise what settings I might use to increase voice quality?
Thank you!

mindmapper15 · Answer 3 · Mon Jun 08 2020 12:14:25 GMT+0800 (China Standard Time)

@gerbill , @zirlman

Did you set the voc_gen_batched=True in your hparams.py?

If so, the inference speed of WaveRNN should be fast.
I got 1700 samples/sec inference speed when I set voc_gen_batched=False

voc_gen_batched=True allows to split single utterance into multiple segments and concatenate these segments like batched data for parallel synthesizing.

But this feature is trade-off feature.

There are two options that helps to make synthesizing more parallel
voc_target decides how many samples to use in one segment
and voc_overlap decides how many samples are overlapped between these segments

For what I experienced, If I set voc_target to lower, inference speed is much faster, but the synthesized audio quality becomes worse.

mindmapper15 · Answer 4 · Mon Jun 08 2020 12:17:23 GMT+0800 (China Standard Time)

@gerbill
There are so many variables what makes your synthesized audio quality worse.
Less-trained TTS model, Less-trained WaveRNN model, etc.
Could you uploads some more informations?
(training steps of each models, hparams.py, etc.)