rishikksh20 / Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

comparison with univnet

thepowerfuldeez opened this issue · comments

Hi! How this work compares with UnivNet for which one you already implemented code: https://github.com/rishikksh20/UnivNet-pytorch
This paper is a little bit newer but afaik they're more concerned about generalizability of model for unseen speakers whlie this work focuses on overall quality (especially in high frequences)
can you maybe elaborate?

@thepowerfuldeez Fre-GAN is better than UnivNet

have you tried to train on LJSpeech or your dataset? How much iterations needed comparing with HiFiGAN? Do you have checkpoints somewhere?

I tried on my own dataset it takes 150k itr to generate excellent voice whereas HiFi-GAN usually takes 1 M steps for same quality.

It only takes 2 days to reach 150k itr

got it, thanks

tried it out. i compare publicly available universal v1 hifigan (trained on 2.5M iterations on vctk) with this one trained on 150k at new HIFI-TTS dataset (5 times more data). It sounds great but I think it should be trained a bit more. Maybe 250k will be enough.

Out of curiosity how many GPUs did you train with, and which ones?

3x3090 with batch 16
but I can confirm that fre-gan is training much faster than hifi-gan