improving audio quality? (4-speaker)

Question

improving audio quality? (4-speaker)

byuns9334 opened this issue 4 years ago · comments

Hi, we've been experimenting TTS in two setups:

tacotron2 for mel-spectrogram + vocoder (4-speaker)
tacotron2 for linear-spectrogram (directly, without post-net) + griffin-lim (4-speaker)
sounds better than 2), but it's still not as perfect as human ground-truth speech. (it still has very slight vibrating noise)
(i've uploaded samples below, please check out)

Any ideas on how to improve these further (in terms of post-processing or whatever)? we're pretty sure each component of each setup is trained pretty enough. our data is about 25 hours in total.

Thanks in advance!
samples.zip