comparison with univnet

Question

comparison with univnet

thepowerfuldeez opened this issue 3 years ago · comments

Hi! How this work compares with UnivNet for which one you already implemented code: https://github.com/rishikksh20/UnivNet-pytorch
This paper is a little bit newer but afaik they're more concerned about generalizability of model for unseen speakers whlie this work focuses on overall quality (especially in high frequences)
can you maybe elaborate?

Rishikesh (ऋषिकेश) · Answer 1 · Wed Jun 30 2021 23:07:49 GMT+0800 (China Standard Time)

@thepowerfuldeez Fre-GAN is better than UnivNet

George Grigorev · Answer 2 · Wed Jun 30 2021 23:09:39 GMT+0800 (China Standard Time)

have you tried to train on LJSpeech or your dataset? How much iterations needed comparing with HiFiGAN? Do you have checkpoints somewhere?

Rishikesh (ऋषिकेश) · Answer 3 · Wed Jun 30 2021 23:17:44 GMT+0800 (China Standard Time)

I tried on my own dataset it takes 150k itr to generate excellent voice whereas HiFi-GAN usually takes 1 M steps for same quality.

Rishikesh (ऋषिकेश) · Answer 4 · Wed Jun 30 2021 23:18:12 GMT+0800 (China Standard Time)

It only takes 2 days to reach 150k itr

George Grigorev · Answer 5 · Wed Jun 30 2021 23:18:36 GMT+0800 (China Standard Time)

got it, thanks

George Grigorev · Answer 6 · Wed Jul 07 2021 06:22:29 GMT+0800 (China Standard Time)

tried it out. i compare publicly available universal v1 hifigan (trained on 2.5M iterations on vctk) with this one trained on 150k at new HIFI-TTS dataset (5 times more data). It sounds great but I think it should be trained a bit more. Maybe 250k will be enough.

Iamgoofball · Answer 7 · Thu Jul 15 2021 17:27:07 GMT+0800 (China Standard Time)

Out of curiosity how many GPUs did you train with, and which ones?

George Grigorev · Answer 8 · Thu Jul 15 2021 17:28:31 GMT+0800 (China Standard Time)

3x3090 with batch 16
but I can confirm that fre-gan is training much faster than hifi-gan