netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How much data samples would I need to fine tune a new voice with a stable prompt ?

JacopoMangiavacchi opened this issue · comments

Thank you very much for sharing the receipt for fine tuning the LJSpeech dataset. I'm wondering if I can still train a new voice with a smaller dataset. With other model architecture I was able to clone a voice using something like 1 hour of training data. Could this be enough for EmotiVoice?

Thanks!

Yes, I believe that one hour of training data should be sufficient for EmotiVoice's Voice Cloning.

Thank you! I've been fine tuning a new voice but I'm having issue inferencing this voice. In the LJSpeech fine-tuning receipt on step 5, when calling python inference_am_vocoder_exp.py, the parameter --logdir is missing and I see this is a mandatory argument for the script. I'm confused about the value to pass here.

It looks like I'm able to pass '.' to logdir for concatenating the right path but then again the script complains for a missing config.json file in the exp/LJspeech/tmp/ folder. I can't find this config.json file. What it should contains ?

'logdir' is a required argument for 'inference_am_vocoder_joint.py', but it is not utilized in 'inference_am_vocoder_exp.py'.

Thank you again @syq163, I was able to inference using the WangZeJun/simbert-base-chinese bert features. I see the script directly download these from HF repo. I only found the content and style subfolders in the exp/LJspeech/tmp/ folder.