How do you train the mfa acoustic model?
SandroChen opened this issue · comments
I follow your tips on training the mfa acoustic model but cannot get labels on aishell3 as accurate as the one you offered.
I see there is 'sp' in the alignment result and its position is suprisingly accurate. I compared it with the one aishell3 dataset itself offered, and find the one you offered is more accurate. For example:
The sentence "广州%女大学生%登山%失联%四天%警方%找到%疑似%女尸$"。After listening to the original wave file, I find there should be a pause between "登" and "山", which is missed by this dataset. But in the textgrid files that you offer, there is a "sp" between the phones "eng1" and "sh". I wonder how you success to produce this accurate pause label? The mfa acoustic model that I train on aishell3 does not produce this label.
i have same question