What Input Parameters are available?

Question

What Input Parameters are available?

souvikqb opened this issue 8 months ago · comments

This project is impressive! I had the chance to test the Colab Demo, and it's quite remarkable.

I'm interested in understanding the range of input parameters available to tweak the output audio. Could you provide details on those?
Is there a maximum duration limit for the audio that the model can generate?
Can the model perform voice cloning?
How many languages does the model currently support?

Jaskaran Singh · Answer 1 · Wed Nov 22 2023 15:32:30 GMT+0800 (China Standard Time)

Thanks @souvikqb, This project is still in the making and we are planning to release models trained on larger multilingual datasets that should enable zero-shot voice cloning.
This current model is trained on LibriTTS.

You can find the input variables in the infer_tts function.
you can play with those, if you find anything interesting let me know.
it would be better if you can create a PR for the same!

souvikqb · Answer 2 · Wed Nov 22 2023 15:35:51 GMT+0800 (China Standard Time)

Got it!

Anything on point 2 and 4?

Jaskaran Singh · Answer 3 · Wed Nov 22 2023 15:38:46 GMT+0800 (China Standard Time)

It is trained on LibriTTS and i have clipped it to 10s max, haven't experimented with longer speech generation
Currently only english, will be training the same on larger multilingual datasets and release the weights, it should cover more than 10+ languages.

souvikqb · Answer 4 · Wed Nov 22 2023 15:42:20 GMT+0800 (China Standard Time)

It is trained on LibriTTS and i have clipped it to 10s max, haven't experimented with longer speech generation

Currently only english, will be training the same on larger multilingual datasets and release the weights, it should cover more than 10+ languages.

Thanks!

Looking forward to the upcoming developments