Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

improving audio quality? (4-speaker)

byuns9334 opened this issue · comments

Hi, we've been experimenting TTS in two setups:

  1. tacotron2 for mel-spectrogram + vocoder (4-speaker)

  2. tacotron2 for linear-spectrogram (directly, without post-net) + griffin-lim (4-speaker)

  3. sounds better than 2), but it's still not as perfect as human ground-truth speech. (it still has very slight vibrating noise)
    (i've uploaded samples below, please check out)

Any ideas on how to improve these further (in terms of post-processing or whatever)? we're pretty sure each component of each setup is trained pretty enough. our data is about 25 hours in total.

Thanks in advance!
samples.zip