Chinese characters are spoken faster than English words, will this model work on Chinese?

Question

Chinese characters are spoken faster than English words, will this model work on Chinese?

zwfcrazy opened this issue 4 years ago · comments

I want to build a dataset of Chinese characters to train this model.
I applied speech recognition on some Chinese news videos (by CCTV).
The recognition part was fine, but I found that Chinese characters are too short in terms of pronounce time because each of them has only one syllable.
The average number of video frames it takes to show the lip movement of a single Chinese character is only 5 (fps=25), and It can be even as low as 2 frames. This is much less than the required 29 frames. Obviously, interpolation won't work well in this case.
So I would like to know if you guys have considered Chinese? Will this model work? Is there any workaround?

Hang_Zhou · Answer 1 · Wed Apr 15 2020 17:48:27 GMT+0800 (China Standard Time)

You can get rid of the recognition and adversarial part of the model. Then it can work regardless of language and input lengths. Although a crucial part is removed, I think at least reasonable results can be obtained in this way with acceptable performance. It will be better if the pretrained weights of our model can be loaded then finetuned on your dataset. However, you may need to modify the code (delete several parts, modify input length) for it to work well.

ak9250 · Answer 2 · Tue Apr 28 2020 22:22:46 GMT+0800 (China Standard Time)

@zwfcrazy have you tried this https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose
seems to work regardless of language

@Hangz-nju-cuhk this paper https://arxiv.org/pdf/2004.12992.pdf cites this work and is able to handle head pose and speaker awareness

Hang_Zhou · Answer 3 · Thu Apr 30 2020 16:38:20 GMT+0800 (China Standard Time)

@ak9250 Thanks for your reference. I am familiar with both these papers and even have seen their videos before they are on arxiv. They are both great works. I would definitely recommend researchers to try the state-of-the-art models, as mine seems a little out-of-date for now.

Wenfei Zhu · Answer 4 · Wed May 06 2020 11:53:49 GMT+0800 (China Standard Time)

@ak9250 @Hangz-nju-cuhk sorry for the late reply. Thank you both! I will close this issue for now.