improving audio quality? (4-speaker)
byuns9334 opened this issue · comments
byuns9334 commented
Hi, we've been experimenting TTS in two setups:
-
tacotron2 for mel-spectrogram + vocoder (4-speaker)
-
tacotron2 for linear-spectrogram (directly, without post-net) + griffin-lim (4-speaker)
-
sounds better than 2), but it's still not as perfect as human ground-truth speech. (it still has very slight vibrating noise)
(i've uploaded samples below, please check out)
Any ideas on how to improve these further (in terms of post-processing or whatever)? we're pretty sure each component of each setup is trained pretty enough. our data is about 25 hours in total.
Thanks in advance!
samples.zip