A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Add blizzard dataset support.
pip3 install -r requirements.txt
Hyperparameters.py
--- contain all hyperparametersNetwork.py
--- encoder and decoderModules.py
--- some modules for tacotronLoss.py
--- loss functionData.py
--- load datasetutils.py
--- some util functions for data I/OSynthesis.py
--- generate wav files
- Download a multispeaker dataset
- preprocess your data and write your
get_XX_data
function inData.py
- Adjust hyperparameters in
Hyperparameters.py
- make a directory named
log
as follow:
--- log
| |
| --- log[log_number]
|
--- code
|
--- Tacotron
|
--- train.py
|
--- Network.py
|
......
- run train.py
python3 train.py [log_number] [dataset_size] [start_epoch]
[log_number]: the log directory number
[dataset_size]: int or all
[start_epoch]: which epoch start to train (0 if start from scratch )
for example:
python3 train.py 0 all 0
rungenerate.py
, modify the text
in generate.py
before running
only support Chinese