gostan99 / vixtts-demo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

viXTTS Demo

viXTTS is a text-to-speech voice generation tool that offers voice cloning voices in Vietnamese and other languages. This model is a fine-tuned version based on the XTTS-v2.0.3 model, utilizing the viVoice dataset. This repository is primarily intended for inference purposes.

The model can be accessed at: viXTTS on Hugging Face

Online usage (Recommended)

For a quick demonstration, please refer to this notebook on Google Colab. Tutorial (Vietnamese): https://youtu.be/pbwEbpOy0m8?feature=shared viXTTS Colab Demo

Local Usage

This code is specifically designed for running on Ubuntu or WSL2. It is not intended for use on macOS or Windows systems (might available later). viXTTS Gradio Demo

Hardware Recommendations

  • At least 10GB of free disk space
  • At least 16GB of RAM
  • Nvidia GPU with a minimum of 4GB of VRAM
  • By default, the model will utilize the GPU. In the absence of a GPU, it will run on the CPU and run much slower.

Required Software

  • Git
  • Python version >=3.9 and <= 3.11. The default version is set to 3.11, but you can modify the Python version in the run.sh file.

Usage

git clone https://github.com/thinhlpg/vixtts-demo
cd vixtts-demo
./run.sh
  1. Run run.sh (dependencies will be automatically installed for the first run).
  2. Access the Gradio demo link.
  3. Load the model and wait for it to load.
  4. Inference and Enjoy 🤗
  5. The result will be saved in output/

Limitation

  • Subpar performance for input sentences under 10 words in Vietnamese language (yielding inconsistent output and odd trailing sounds).
  • This model is only fine-tuned in Vietnamese. The model's effectiveness with languages other than Vietnamese hasn't been tested, potentially reducing quality.

Acknowledgements

We would like to express our gratitude to all libraries, and resources that have played a role in the development of this demo, especially:

Contact

About


Languages

Language:Jupyter Notebook 58.2%Language:Python 37.5%Language:Shell 4.3%