ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Synthesizing outputs a wired sound

DanojaDias opened this issue · comments

Hi All,
I trained the model using custom data set for Sinhala Language using around 14 hour data for 100000 steps. After synthesizing the output wav file gives a wired sound which is not even closer to the expected speech. What could be the possible reason for this?

tensorboard output audio tab shows audios with suffixes of _reconstructed and _synthesized. The audios with the suffix _synthesized outputs a sound similar to the audio with the suffix _reconstructed. Does this mean something ? I am sorry I am very new to this area. So didn't understand tensorboard output exactly.

I am using the lexicon here. https://raw.githubusercontent.com/google/language-resources/master/si/data/lexicon.tsv . I have updated this to have more words. I generated MFA text grids using this my speech corpus and this lexicon.

Could someone please help me here?

Thank you