Streaming text to speech

Question

Streaming text to speech

Omarnabk opened this issue 3 years ago · comments

Firstly, thanks for this great work.
I'm trying to use your model for streaming text-to-speech applications. The quality is good, but the speed is slow. I'm using a GPU Tesla V4 with 16G RAM.
Any change to the configuration can help in speeding up the process.

Currently, producing 2 seconds of audio takes 14 seconds of processing starting from the text to the produced audio

Trần Nguyễn Đức Thọ · Answer 1 · Tue Jul 13 2021 13:15:22 GMT+0800 (China Standard Time)

Hi @Omarnabk, I'm working on a streaming text-to-speech for our apps. But I'm not quite sure what a streaming text-to-speech feature is. Can you help me explain?

stale · Answer 2 · Tue Aug 03 2021 13:49:49 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Tomáš Nekvinda · Answer 3 · Wed Aug 04 2021 15:56:14 GMT+0800 (China Standard Time)

Hello @Omarnabk , I think it is mainly caused by the WaveRNN vocoder, am I right? Try using MelGAN, HiFi-GAN or WaveGrad, all of them should be faster I think. Also, when processing longer or more audios with WaveRNN you can get some speedup from parallelization and the batch processing it uses (not faster than real-time, but it is sometimes matching).

stale · Answer 4 · Wed Aug 25 2021 21:25:15 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.