ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Making dataset

peanut1101 opened this issue · comments

How to create dataset?

First of all, you need to collect audio data in wav/mp3 format and then use this or analog for annotation your audios.
2 step – export your annotations as csv file and .zip of wavs(unzip this).
3 step – create configs for you data and run python3 prepare_align.py config/LJSpeech/preprocess.yaml.
This create raw data dir with .wavs and .lab files, which contain text from corresponding wav.
4 step – install MFA or another aligner and:

  • create pronunciation dictionary for your data using mfa or another tool

  • train mfa aligner
    example: mfa train raw_data/your_dataset/ lexicon/your_dataset.txt out_dir/aligner_model.zip preprocessed_data/your_dataset/TextGrid

5 step – run python3 preprocess.py config/LJSpeech/preprocess.yaml. It create preprocessed_data dir with data for training.
6 step – run model training

Thanks a lot for your contribution, @ruslantau i have a question, in order to train i need als speakers.json (this is ok is just the mapping to id of each speaker in the dataset) stats.json, how to compute this? moreover i need the new list with transcription (val.txt and train.txt) how to generate those, i already have the lists for train and val but i have the text not codified in the phonemas extracted by the mfa, how can i do it?

Hi @alessandropec,

Have you find a ways to make stas.json ?

@phamkhactu read this part

with open(os.path.join(self.out_dir, "speakers.json"), "w") as f: