WeTTS

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Install

We suggest installing WeTTS with Anaconda or Miniconda. Clone this repo:

git clone https://github.com/wenet-e2e/wetts.git

For CUDA 10.2, run:

conda create -n wetts python=3.8 montreal-forced-aligner pytorch=1.11 \
torchaudio cudatoolkit=10.2 -c pytorch -c conda-forge

For CUDA 11.3, run:

conda create -n wetts python=3.8 montreal-forced-aligner pytorch=1.11 \
torchaudio cudatoolkit=11.3 -c pytorch -c conda-forge

Installing other dependencies using:

conda activate wetts
python -m pip install -r requirements.txt

We mainly focus on production and on-device TTS, and we plan to use:

And we are going to provide reference solution of:

We plan to support a variaty of open source TTS datasets, include but not limited to:

BZNSYP, Chinese Standard Mandarin Speech corpus open sourced by Data Baker.
AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus.
Opencpop, Mandarin singing voice synthesis (SVS) corpus open sourced by Netease Fuxi.

We plan to support a variaty of hardwares and platforms, including:

We borrow some code from FastSpeech2 for FastSpeech2 implentation.
We refer PaddleSpeech for feature extraction, pinyin lexicon preparation for alignment, and the length regulator in FastSpeech2.

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Apache License 2.0

Language:Python 100.0%