rhasspy / larynx

End to end text to speech system using gruut and onnx

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hifi_gan-vctk_small vs hifi_gan-vctk_medium (release 2021-03-28)

svenha opened this issue · comments

The naming confuses me a little bit. hifi_gan-vctk_small is larger (and slower) than hifi_gan-vctk_medium.

I wondered this as well, but the labeling from the pre-trained models in the original repo has the "medium" one as vctk_v2 and "small" as vctk_v3. Based on my understanding of the config files, v2 should be larger/slower than v3.

To make it extra confusing, the small/v3 model uses a different "resblock" but more upscale channels than medium/v2, which uses a similar configuration to the universal_large/v1 model.

I may just flip the medium/small labels though if there is an obvious performance difference between the two. I've focused all my testing on the large vs. small to date.

So I've tested medium and small for a larger number of voices, short and long sentences and small was either equal or even slower (within the error bars I guess).

I ended up swapping the medium/low vocoder labels in v0.5