yl4579 / AuxiliaryASR

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How much data did you use to train the model?

Charlottecuc opened this issue · comments

Hi. Thank you for you great work!
I was wondering how much data did you use to train the model, and did you augment the data?
I notice that you put the LJSpeech dataset here as an example, but the sample rate of LJ is 22050khz, so I think it is not the data you actually used when training the model...?

I used LibriTTS, a subset of LibriSpeech with a higher sampling rate (24kHz). The dataset is too big so I didn't upload the file and it wasn't quite meaningful either because surely you wouldn't train a model using this dataset yourself, so the uploaded Data folder was only meant to be used as an example.