Streaming text to speech
Omarnabk opened this issue · comments
Firstly, thanks for this great work.
I'm trying to use your model for streaming text-to-speech applications. The quality is good, but the speed is slow. I'm using a GPU Tesla V4 with 16G RAM.
Any change to the configuration can help in speeding up the process.
Currently, producing 2 seconds of audio takes 14 seconds of processing starting from the text to the produced audio
Hi @Omarnabk, I'm working on a streaming text-to-speech for our apps. But I'm not quite sure what a streaming text-to-speech feature is. Can you help me explain?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hello @Omarnabk , I think it is mainly caused by the WaveRNN vocoder, am I right? Try using MelGAN, HiFi-GAN or WaveGrad, all of them should be faster I think. Also, when processing longer or more audios with WaveRNN you can get some speedup from parallelization and the batch processing it uses (not faster than real-time, but it is sometimes matching).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.