Training wavenet to rap?

Question

Training wavenet to rap?

constantinethegr8 opened this issue a year ago · comments

So I heard tacotron 2 needs very little data 100-300 sentences for good sounding speech. However, it has bad tempo shit. I've seen wavenet can be curated for music and I wondered if the model can be conditioned to tts with rhythm. Even if it it is possible (hopefully), I have heard it requires large amounts of data in the 10's of GB's. Can wavenet can be trained with only 1-2 GB maybe no more than 4GB to get good results? And if it can, how does one prepare a dataset (like condition it)? So I chop audio or spit it in to each line the rapper spoke or give the full acapella? Do I use one wave file or multiple (oh what audio format and number of channels and sample rate)? Sorry, I am extremely new. Any help would be appreciated. Thanks.

Flavius Valerius Constantinus, The Last Roman Emperor