comparison with univnet
thepowerfuldeez opened this issue · comments
Hi! How this work compares with UnivNet for which one you already implemented code: https://github.com/rishikksh20/UnivNet-pytorch
This paper is a little bit newer but afaik they're more concerned about generalizability of model for unseen speakers whlie this work focuses on overall quality (especially in high frequences)
can you maybe elaborate?
@thepowerfuldeez Fre-GAN is better than UnivNet
have you tried to train on LJSpeech or your dataset? How much iterations needed comparing with HiFiGAN? Do you have checkpoints somewhere?
I tried on my own dataset it takes 150k itr to generate excellent voice whereas HiFi-GAN usually takes 1 M steps for same quality.
It only takes 2 days to reach 150k itr
got it, thanks
tried it out. i compare publicly available universal v1 hifigan (trained on 2.5M iterations on vctk) with this one trained on 150k at new HIFI-TTS dataset (5 times more data). It sounds great but I think it should be trained a bit more. Maybe 250k will be enough.
Out of curiosity how many GPUs did you train with, and which ones?
3x3090 with batch 16
but I can confirm that fre-gan is training much faster than hifi-gan