luoyily / MoeTTS

Speech synthesis model /inference GUI repo for galgame characters based on Tacotron2, Hifigan, VITS and Diff-svc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MoeTTS dataset

JoStevens1 opened this issue · comments

Hey, I am interested in training a large English VITS base model to contribute to your project, I was wondering about your dataset, mainly how large is it and what type of data your team used so that I could potentially replicate it in native english, Sorry if this isn't the right place to ask but I couldn't find another way to get in touch, please let me know if you rather speak somewhere else, Thanks ^

Thank you for wanting to contribute models for this project, but I have only adapted the G2P (Grapheme-to-Phoneme) tool for Chinese and Japanese at the moment, and the English models won't work well for now.
About the data: This project is mainly used for learning and to facilitate the production of derivative works by the community, it is rather more amateur, so the data we use is varied. The smallest of the datasets is only 50 minutes of speech with a single speaker, and the largest is over 40 hours with 13 speakers. The quality of the datasets also varies.
For more information on the English datasets you can refer to the official VITS project (https://github.com/jaywalnut310/vits), which I hope will help you.