GANP-TTS
Here is the GitHub repository for the paper: GANP-TTS: A GAN-BASED PRE-GENERATED TTS MODEL WITH MULTI-LOSS FUNCTIONS FOR MORE NATURAL SYNTHESIZED SPEECH.
Audio Samples
Audio samples generated by this implementation can be found here.
Quickstart
Dependencies
You can install the Python dependencies with
pip3 install -r requirements.txt
Training
Datasets
The supported datasets are
- [Biaobei](https://www.data-baker.com/open source.html): a Mandarin TTS dataset consisting of approximately 10,000 short audio samples of a female speaker, totaling approximately 12 hours.
- AISHELL-3: a Mandarin TTS dataset with 218 male and female speakers, roughly 85 hours in total.
We take AISHELL-3 as an example hereafter.
Preprocessing
First, run
python3 prepare_align.py config/preprocess.yaml
for some preparations.
As described in the paper, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences.
After that, run the preprocessing script by
python3 preprocess.py config/preprocess.yaml
Training
Train your model with
python3 train.py -p config/preprocess.yaml -m config/model.yaml -t config/train.yaml
Inference
Test your model with
python3 synthesize.py --text '大数据、云计算、物联网、人工智能等新一代信息技术的应用,给我们带来便利的同时,也带来了新的网络威胁。' --speaker_id 162 --mode single -p config/preprocess.yaml -m config/model.yaml -t config/train.yaml
Citation