🐸 TTS roadmap

Question

🐸 TTS roadmap

erogol opened this issue 3 years ago · comments

These are the main dev plans for 🐸 TTS.

If you want to contribute to 🐸 TTS and don't know where to start you can pick one here and start with our Contribution Guideline. We're also always here to help.

Feel free to pick one or suggest a new one.

Contributions are always welcome 💪 .

v0.1.0 Milestones

v0.2.0 Milestones

Grapheme 2 Phoneme in-house conversion. (Thx to gruut 👍 )
Implement VITS model.

v0.3.0 Milestones

Implement generic ForwardTTS API.
Implement Fast Speech model.
Implement Fast Pitch model.

v0.4.0 Milestones

Trainer API v2 - join the discussion
Multi-speaker VCTK recipes for all the TTS.tts models.

v0.5.0 Milestones

Support for multi-lingual models
YourTTS release 🚀

v0.6.0 Milestones

Add ESpeak support
New Tokenizer and Phonemizer APIs #937
New Model API #1078
Splitting the trainer as a separate repo 👟Trainer
Update VITS model API
Gradient accumulation. #560 (in 👟)

v0.7.0 Milestones

Implement Capacitron 👑 @a-froghyar 👑 @WeberJulian
Release pretrained Capacitron

v0.8.0 Milestones

Separate numpy transforms
Better data sampling for VITS
New Thorsten DE models 👑 @thorstenMueller

🏃‍♀️ Milestones along the way

🤖 New TTS models

Lucas Cassiano · Answer 1 · Tue Mar 23 2021 06:03:43 GMT+0800 (China Standard Time)

great project! Excited to see this growing!

Andrew Barfield · Answer 2 · Sun Apr 18 2021 05:12:21 GMT+0800 (China Standard Time)

I'm learning the code/API and performing experiments. I hope to contribute soon.

I'm also wondering if I can donate (money) to Coqui?

kdavis-coqui · Answer 3 · Sun Apr 18 2021 16:39:37 GMT+0800 (China Standard Time)

I'm learning the code/API and performing experiments. I hope to contribute soon.

I'm also wondering if I can donate (money) to Coqui?

Wow! Thanks! Humbling.

We were setting up GitHub sponsors, but the tax implications were onerous.

We're currently exploring Patreon. So stay tuned!

Agrin Hilmkil · Answer 4 · Mon Apr 26 2021 19:09:35 GMT+0800 (China Standard Time)

@erogol Thanks for sharing the plans!

Do you have any thoughts (or need help to) simplifying the dependencies a bit? I'm thinking that if TTS is used as a lib installed over pip it might be nice to remove visualisation dependencies only used in notebooks, removing test/dev dependencies and moving e.g. tensorflow into extras to reduce the footprint. Personally would love to use this as a dependency rather than maintaining my own fork.

Eren Gölge · Answer 5 · Mon Apr 26 2021 19:19:38 GMT+0800 (China Standard Time)

@agrinh Why do you need to keep your own fork exactly? It'd be better to expand the conversation on gitter if you like.

Agrin Hilmkil · Answer 6 · Mon Apr 26 2021 19:27:36 GMT+0800 (China Standard Time)

@agrinh Why do you need to keep your own fork exactly? It'd be better to expand the conversation on gitter if you like.

Wow, thanks for the super fast reply. Sure, we can move the discussion to gitter.

Sadam Hussain Memon · Answer 7 · Thu May 06 2021 08:27:19 GMT+0800 (China Standard Time)

Please add DC-TTS to the the list of models.

DC-TTS implementation available with MIT Licence code available here
EFFICIENTLY TRAINABLE TEXT-TO-SPEECH SYSTEM BASED ON DEEP CONVOLUTIONAL NETWORKS WITH GUIDED ATTENTION paper
@erogol

Will Rice · Answer 8 · Sat Aug 21 2021 07:05:57 GMT+0800 (China Standard Time)

What were you thinking about the "TensorFlow run-time for training models"? Like giving the user the option of using TensorFlow or PyTorch? I wouldn't mind taking a stab at the TensorFlow part.

Eren Gölge · Answer 9 · Mon Aug 23 2021 19:58:49 GMT+0800 (China Standard Time)

@will-rice the plan is to mirror what we have in torch to TF as much as possible. It'd be great if you initiate the work

Lucas Hideki Ueda · Answer 10 · Mon Aug 30 2021 20:44:05 GMT+0800 (China Standard Time)

Are you guys planning to develop some expressive TTS architectures? I'm currently studying this topic and planning to implement some of them based on Coqui, part of them just controlling latent space using GST Kwon et al 2020 or RE Sorin et al 2020, and others that actually changes the architecture by adding VAE, normalizing flows and gradient reversal

a-froghyar · Answer 11 · Mon Aug 30 2021 20:46:08 GMT+0800 (China Standard Time)

@lucashueda Capacitron VAE: #510

Lucas Hideki Ueda · Answer 12 · Mon Aug 30 2021 21:15:01 GMT+0800 (China Standard Time)

@lucashueda Capacitron VAE: #510

Oh nice, hope to see Capacitron integrated soon. So maybe, in the future I'll be able to contribute with some others expressive architectures

Billy Bob · Answer 13 · Sun Sep 19 2021 05:59:35 GMT+0800 (China Standard Time)

@erogol Look forward to new End-to-End models being implemented, specfically Efficient-TTS! if the paper is accurate, it should blow most 2 stage configurations out of the water, considering it seems to have higher MOS than tacotron2+hifigan, while also seeming to have significantly faster speed than glowtts+fastest vocoder! I have not seen a single repo replicating the EFTS-Wav architecture described in the paper released 10 months ago, it would be amazing to see it in Coqui first!

Eren Gölge · Answer 14 · Sun Sep 19 2021 07:24:19 GMT+0800 (China Standard Time)

@BillyBobQuebec I don't think I will implement these models anytime soon. But as they stand, contributions are welcome

Julian Weber · Answer 15 · Sun Sep 19 2021 07:30:13 GMT+0800 (China Standard Time)

@BillyBobQuebec but you can try VITS which is close to what you're describing :)

Billy Bob · Answer 16 · Sun Sep 19 2021 07:41:10 GMT+0800 (China Standard Time)

@BillyBobQuebec but you can try VITS which is close to what you're describing :)

Agreed, I am currently trying VITS actually, I have some issues training with the coqui implementation unfortunately, I've posted the issue about the bug today and hope I can get it resolved.

HeMath · Answer 17 · Wed Feb 02 2022 14:04:45 GMT+0800 (China Standard Time)

Hi there! Thanks for your great work! I'm looking forward to training YourTTS on other languages. Will training and fine-tuning code of YourTTS be published soon? I would be very grateful if you could tell me an approximate time~ Have a nice day :-D

Ho Kim · Answer 18 · Wed Feb 23 2022 11:12:16 GMT+0800 (China Standard Time)

Hello, thanks for great works! I'm a fan of Coqui TTS.

I'm porting some of the stuffs in the project to the Rust for the following reasons.

Predictable Performance
Static-typed Metadata & Model Management
Multithreaded Server Implementation
Just I love Rust

The VC in the YourTTS has been successfully implemented. And for this purpose, an example of saving/loading a pretrained Vits model has been added in the repo. I write it on Milestones PR because I think my work can be helpful to others :)

Repository (RusTTS): https://github.com/kerryeon/rustts

Eren Gölge · Answer 19 · Wed Feb 23 2022 18:10:47 GMT+0800 (China Standard Time)

@kerryeon great work!! Thanks for sharing!

Paolo · Answer 20 · Sun Feb 27 2022 18:16:12 GMT+0800 (China Standard Time)

Any plan to a port of coqui-ai engine for android? TTS on android is very robotic (espeak, rhvoice, festival lite).

Eren Gölge · Answer 21 · Tue Mar 01 2022 18:35:54 GMT+0800 (China Standard Time)

No immediate plans on that

Danielius · Answer 22 · Wed Mar 30 2022 18:28:26 GMT+0800 (China Standard Time)

Thumbs up for planning ONNX support. Hope it gets prioritized more!

Eren Gölge · Answer 23 · Fri Apr 01 2022 16:19:46 GMT+0800 (China Standard Time)

@Darth-Carrotpie what is your use-case of ONNX? (Just want to get some feedback)

Alexander Korolev · Answer 24 · Tue Apr 05 2022 20:00:31 GMT+0800 (China Standard Time)

@Darth-Carrotpie what is your use-case of ONNX? (Just want to get some feedback)

Personally, for me it sounds like a good way to develop Windows nativ TTS applications without needing a Python runtime and/or the big dependencies like pytorch.

I tried exporting the VITS model to onnx before, but didn't succeed.
There are also other obstacles beside executing the model, like phonemization. ^^

Currently I am using pythonnet to embed the required python functions directly in my C# code. For Python I use the embedded version to make the App distributable.

Danielius · Answer 25 · Thu Apr 07 2022 15:20:36 GMT+0800 (China Standard Time)

@erogol I am trying to run models in Unity. It's environment is in C#, .NET Standard 2.1. Having a universal format model also means in the long run I can not only run models in OS agnostic manner. Of course things like tokenization and phonemization are additional hurdles, but if there are open source examples it's quite doable. For models needing tokenizers I've been using BlingFire succesfully, so I reckon there's similar phonemizer helpers / libraries for other languages beside python, including C#.
Edit:
things that embed python into C#, like pythonnet are convenient, though quite slow. In my case, where I have multiple models loaded and running at the same time (i.e. ~10) means that needless interpreter overhead can become a critical bottleneck. Plus it might add unforeseen debugging issues.

Eren Gölge · Answer 26 · Thu Apr 07 2022 20:10:01 GMT+0800 (China Standard Time)

@Darth-Carrotpie run in unity means in the code or integrate it to Unity editor?

Also better to move this to a separate post under the Discussions

Danielius · Answer 27 · Fri Apr 08 2022 17:56:41 GMT+0800 (China Standard Time)

@Darth-Carrotpie run in unity means in the code or integrate it to Unity editor?

Also better to move this to a separate post under the Discussions

Created a topic on ONNX at Discussions: #1479

Adesina Oluwarotimi · Answer 28 · Thu Apr 21 2022 22:11:28 GMT+0800 (China Standard Time)

Is there a flutter package for using this TTS library? Might be an easy way to get this for use in real-world applications.

I am also very new to development but will like to contribute to this project. Can I work under someone?

Eren Gölge · Answer 29 · Wed Apr 27 2022 16:57:13 GMT+0800 (China Standard Time)

@desh-woes there is no flutter package, unfortunately.

Can you DM me on Gitter or Element (out chat rooms) if you're willing to work on a particular thing?

omkarade · Answer 30 · Sun Jul 03 2022 14:46:03 GMT+0800 (China Standard Time)

how train model using word embedding as input

Eren Gölge · Answer 31 · Tue Jul 05 2022 16:44:31 GMT+0800 (China Standard Time)

@omkarade no support for that yet.

omkarade · Answer 32 · Tue Jul 05 2022 20:22:51 GMT+0800 (China Standard Time)

I want to train a custom Your TTS model on my data set. Can you please share me detailed process.

Julian Weber · Answer 33 · Mon Jul 18 2022 14:29:02 GMT+0800 (China Standard Time)

I want to train a custom Your TTS model on my data set. Can you please share me detailed process.

You can read the relevant documentation here: https://tts.readthedocs.io/en/latest/finetuning.html
Also this is the roadmap thread, please ask for support here or open a new discussion/issue

windowshopr · Answer 34 · Thu Sep 15 2022 13:29:33 GMT+0800 (China Standard Time)

Looking forward to the SSML implementation!

Troy Smith · Answer 35 · Fri Sep 16 2022 04:20:33 GMT+0800 (China Standard Time)

@erogol is the NaturalSpeech paper something you'd think about implementing I could take a crack at it.

Eren Gölge · Answer 36 · Fri Sep 16 2022 16:22:45 GMT+0800 (China Standard Time)

@Kthulu120 sure thing. Feel free to shoot a PR. We are always here to help.

Eric Tulowetzke · Answer 37 · Sun Oct 16 2022 07:54:22 GMT+0800 (China Standard Time)

Will there be a C API to this library like your STT library?

Eren Gölge · Answer 38 · Tue Oct 18 2022 04:10:36 GMT+0800 (China Standard Time)

Not in the roadmap currently

Eren Gölge · Answer 39 · Wed Oct 26 2022 22:30:28 GMT+0800 (China Standard Time)

@JediMaster25 you think the Roadmap is the right place for this convo?

stale · Answer 40 · Sat Nov 26 2022 19:15:41 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Nikoru · Answer 41 · Sun Nov 27 2022 18:42:42 GMT+0800 (China Standard Time)

It's good to see progress on a propper tts project. I'm running arch and no cuda and I am gonna see if I can convince it to use my cpu instead!
What would be really cool would be if this could work on AVX512 in amd chipsets.

nfaraji2002 · Answer 42 · Sun Dec 18 2022 13:55:27 GMT+0800 (China Standard Time)

Hi
thanks for delightful codes!
I want to use this version of TTS on raspberry pi 4, but I think this version does not support real time processing.
Are there TF utilities provided as in Mozilla TTS to convert trained models to tf-lite?
Can the strategy of quantization work here for real-time processing?
I need some roadmaps in this regard.

Thanks
Neda

jhj0517 · Answer 43 · Mon Jan 16 2023 18:06:12 GMT+0800 (China Standard Time)

Thank you for your great work for TTS.

Is there any progress on Let the user pass a custom text cleaner function. ?
If it's possible, I want to pass my own Korean cleaners.

Eren Gölge · Answer 44 · Tue Jan 17 2023 07:12:18 GMT+0800 (China Standard Time)

You can currently do it by creating your own tokenizer or overloading the class.

stale · Answer 45 · Sat Feb 18 2023 03:08:08 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Max · Answer 46 · Mon Feb 20 2023 09:40:04 GMT+0800 (China Standard Time)

Marvelous project.
Any ways to donate to core contributors?
I would prefer to use paypal.

stale · Answer 47 · Thu Mar 23 2023 03:23:54 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Eren Gölge · Answer 48 · Thu Mar 23 2023 19:12:35 GMT+0800 (China Standard Time)

@MaxIakovliev you can use https://coqui.ai/ :)

Eren Gölge · Answer 49 · Thu Mar 23 2023 19:14:23 GMT+0800 (China Standard Time)

This roadmap issue is quite outdated. I'll keep it open to keep the references to some of the issues and models we like to tackle but won't be updating until one day officially becomes 48 hours.

Joao Oliveira · Answer 50 · Thu May 11 2023 00:42:17 GMT+0800 (China Standard Time)

Any update regarding SSML implementation?

Eren Gölge · Answer 51 · Thu May 11 2023 16:48:55 GMT+0800 (China Standard Time)

We are not working on SSML currently, it is back in the list without a precise timeline.

Abhijeet Singh · Answer 52 · Thu May 18 2023 11:19:30 GMT+0800 (China Standard Time)

Please do!!

stale · Answer 53 · Sat Jun 17 2023 16:14:12 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Crystal Liu · Answer 54 · Thu Sep 21 2023 16:28:00 GMT+0800 (China Standard Time)

Will you support bark-small? Thanks.

Fangjun Kuang · Answer 55 · Sat Nov 11 2023 13:15:34 GMT+0800 (China Standard Time)

Any plan to a port of coqui-ai engine for android? TTS on android is very robotic (espeak, rhvoice, festival lite).

@paolo-caroni

Please take a look at
#3194

You can use sherpa-onnx to run VITS models from Coqui on Android and also embedded devices, e.g., raspberry pi.

We have pre-built Android APKs for the VITS English models from Coqui.
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

DmitryVN · Answer 56 · Wed Nov 22 2023 04:45:28 GMT+0800 (China Standard Time)

Fix it plz #3039 #3282
The problem persists and because of this, normal correct use is not possible. Also at the moment it kind of breaks off the phrase at the end of each sentence and it turns out a jerky reading.

MarkChrisE2091 · Answer 57 · Sat Dec 23 2023 20:51:30 GMT+0800 (China Standard Time)

Any new update?

Fangjun Kuang · Answer 58 · Sun Dec 31 2023 22:41:13 GMT+0800 (China Standard Time)

Any plan to a port of coqui-ai engine for android? TTS on android is very robotic (espeak, rhvoice, festival lite).

@paolo-caroni

We have supported it in k2-fsa/sherpa-onnx#508

The following is a YouTube video
https://www.youtube.com/watch?v=33QYuVzDORA

You can use all coqui-ai/TTS models and piper models listed in
https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
with k2-fsa/sherpa-onnx#508

ер · Answer 59 · Mon Apr 29 2024 23:19:43 GMT+0800 (China Standard Time)

hi guys, why?

upd: found https://twitter.com/_josh_meyer_/status/1742522906041635166

Nikoru · Answer 60 · Mon Apr 29 2024 23:34:26 GMT+0800 (China Standard Time)

Their ability to exist and be profitable was dependent on how much better their tech was compared to everyone else. It may not feel like it, but we are in the middle of an AI singularity. Coqui's business model might have stood a chance if they started with this tech 5 years earlier, but it was probably too little too late. Eleven labs is probably eating their lunch :/

angadkwatra73 · Answer 61 · Mon Jun 24 2024 17:26:54 GMT+0800 (China Standard Time)

Can we still use their product well, even though they have shut down?

Enno Hermann · Answer 62 · Mon Jun 24 2024 18:30:24 GMT+0800 (China Standard Time)

Can we still use their product well, even though they have shut down?

You can still use the code and models (within the license terms). We also maintain a fork that has a bunch of bug fixes and other small updates.