dubverse-ai / MahaTTS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What Input Parameters are available?

souvikqb opened this issue · comments

This project is impressive! I had the chance to test the Colab Demo, and it's quite remarkable.

  1. I'm interested in understanding the range of input parameters available to tweak the output audio. Could you provide details on those?

  2. Is there a maximum duration limit for the audio that the model can generate?

  3. Can the model perform voice cloning?

  4. How many languages does the model currently support?

Thanks @souvikqb, This project is still in the making and we are planning to release models trained on larger multilingual datasets that should enable zero-shot voice cloning.
This current model is trained on LibriTTS.

You can find the input variables in the infer_tts function.
you can play with those, if you find anything interesting let me know.
it would be better if you can create a PR for the same!

Got it!

Anything on point 2 and 4?

  1. It is trained on LibriTTS and i have clipped it to 10s max, haven't experimented with longer speech generation
  2. Currently only english, will be training the same on larger multilingual datasets and release the weights, it should cover more than 10+ languages.
  1. It is trained on LibriTTS and i have clipped it to 10s max, haven't experimented with longer speech generation
  2. Currently only english, will be training the same on larger multilingual datasets and release the weights, it should cover more than 10+ languages.

Thanks!

Looking forward to the upcoming developments