RussellSB / tt-vae-gan

Timbre transfer with variational autoencoding and cycle-consistent adversarial networks. Able to transfer the timbre of an audio source to that of another.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improvements to results

rohitgupta3 opened this issue · comments

General question, not an issue, apologies if this is the wrong place for such queries. I was wondering a couple things:

  1. In general, are there more / less promising ways to create better results? Many of the voice conversions I've tried via this repo have had strange artifacts. Even in the core VAE-GAN demo, I'd say (subjectively) that the male=>female conversions sound a lot better than the female=>male, with the latter having lots of warbled speech. Maybe too broad a question, but based on your experience I'd be curious how you'd go about improving this? E.g. are there specific hyperparameters you'd change, and/or is it due to nature of the training data, etc etc?

  2. How good have you found MelGAN vs WaveNet? I'm wondering whether to dive more into training WaveNet or not given what appears to be MelGAN's speed benefits (both training and inference). And along the lines of MelGAN, I'm curious whether you've found any pretrained models (whether the implementation you link or the official one) that you think are good enough or whether you typically train MelGAN yourself.

Appreciate any thoughts here.