ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How do you train the mfa acoustic model?

SandroChen opened this issue · comments

commented

I follow your tips on training the mfa acoustic model but cannot get labels on aishell3 as accurate as the one you offered.
I see there is 'sp' in the alignment result and its position is suprisingly accurate. I compared it with the one aishell3 dataset itself offered, and find the one you offered is more accurate. For example:

The sentence "广州%女大学生%登山%失联%四天%警方%找到%疑似%女尸$"。After listening to the original wave file, I find there should be a pause between "登" and "山", which is missed by this dataset. But in the textgrid files that you offer, there is a "sp" between the phones "eng1" and "sh". I wonder how you success to produce this accurate pause label? The mfa acoustic model that I train on aishell3 does not produce this label.

commented

i have same question