r9y9 / nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

Home Page:https://r9y9.github.io/nnmnkwii/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add link to GAN-TTS and GAN-VC in doc

r9y9 opened this issue · comments

I'm currently exploring GANs in text-to-synthesis and voice conversion. When I finish up my work, I will add link to the repo in the doc, to demonstrate the library capability.

Reference: https://arxiv.org/abs/1709.08041

That will be the first example of minimum generation error (MGE) training using autograd module.

https://r9y9.github.io/nnmnkwii/latest/references/autograd.html

Progress:

  • MGE training for VC/TTS is working reasonably (as baseline), not significantly better than MSE loss though.
  • GAN-VS is working great as expected. GV is implicitly compensated and speech quality is greatly improved. Very nice!
  • Implmented GAN-TTS, but it is not working. I suspect adversarial loss for multi-stream features (mgc, lf0, vuv and bap) is difficult to optimize. I'm trying further experiments. MGE loss (mgc, lf0, vuv and bap) + ADV loss (mgc) may work?

GAN-VC and GAN-TTS are now working great. Essential to actually make GAN-TTS work great is that do computing ADV loss with mgc (without 0-th) only, not all features.