Tomiinek / Multilingual_Text_to_Speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Streaming text to speech

Omarnabk opened this issue · comments

Firstly, thanks for this great work.
I'm trying to use your model for streaming text-to-speech applications. The quality is good, but the speed is slow. I'm using a GPU Tesla V4 with 16G RAM.
Any change to the configuration can help in speeding up the process.

Currently, producing 2 seconds of audio takes 14 seconds of processing starting from the text to the produced audio

sc

Hi @Omarnabk, I'm working on a streaming text-to-speech for our apps. But I'm not quite sure what a streaming text-to-speech feature is. Can you help me explain?

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Hello @Omarnabk , I think it is mainly caused by the WaveRNN vocoder, am I right? Try using MelGAN, HiFi-GAN or WaveGrad, all of them should be faster I think. Also, when processing longer or more audios with WaveRNN you can get some speedup from parallelization and the batch processing it uses (not faster than real-time, but it is sometimes matching).

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.