FACEGOOD / FACEGOOD-Audio2Face

http://www.facegood.cc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How did you generate the training data (bs values)?

ChairManMeow-SY opened this issue · comments

As I can find in your google drive, the training data is generated by Zishu Mei's audio recordings, which are obtained with TTS algorithm.

So how do you get the blendshapes of Zishu Mei? It confuses me because that Zishu Mei is a 3D model.

If the blend shape values are captured from a human with the facial motion capturing system, how do we guarantee the TTS audio matches the bs value?

Please correct me if I misunderstand something..

commented

That is a good question.
In the early stage of the project, we used the tts to create the audio data from text .
And we manually created the mdoel's animation of ZishuMei according to the audio data.Like listening to a pronunciation, making an animation.
For now,we suggest to use the facial motion capturing system. Recording the video and audio at the same time of actor's performance . TTS audio is the actor's customized voice.

That is a good question. In the early stage of the project, we used the tts to create the audio data from text . And we manually created the mdoel's animation of ZishuMei according to the audio data.Like listening to a pronunciation, making an animation. For now,we suggest to use the facial motion capturing system. Recording the video and audio at the same time of actor's performance . TTS audio is the actor's customized voice.

Amazing but great job! The training data is really expensive and valuable. Thank you so much for the reply and for the data.