What Input Parameters are available?
souvikqb opened this issue · comments
This project is impressive! I had the chance to test the Colab Demo, and it's quite remarkable.
-
I'm interested in understanding the range of input parameters available to tweak the output audio. Could you provide details on those?
-
Is there a maximum duration limit for the audio that the model can generate?
-
Can the model perform voice cloning?
-
How many languages does the model currently support?
Thanks @souvikqb, This project is still in the making and we are planning to release models trained on larger multilingual datasets that should enable zero-shot voice cloning.
This current model is trained on LibriTTS.
You can find the input variables in the infer_tts function.
you can play with those, if you find anything interesting let me know.
it would be better if you can create a PR for the same!
Got it!
Anything on point 2 and 4?
- It is trained on LibriTTS and i have clipped it to 10s max, haven't experimented with longer speech generation
- Currently only english, will be training the same on larger multilingual datasets and release the weights, it should cover more than 10+ languages.
- It is trained on LibriTTS and i have clipped it to 10s max, haven't experimented with longer speech generation
- Currently only english, will be training the same on larger multilingual datasets and release the weights, it should cover more than 10+ languages.
Thanks!
Looking forward to the upcoming developments