Audio-driven implicit keypoint prediction

Question

Audio-driven implicit keypoint prediction

laxyon opened this issue a month ago · comments

Anyone tried audio-driven implicit keypoint prediction? i've tried it based on Faceformer which is mentioned in appendix C of the paper. but i got failed, the loss (mse between predicted exp and gt exp) reduced tremendously at the beginning, but the result of keypoint sequence is wired, the movement is almost disappered. so maybe someone give some advice?

Chenzzzzzz217 · Answer 1 · Mon Oct 14 2024 14:36:52 GMT+0800 (China Standard Time)

I have encountered the same problem. Does anyone know how to solve it?

xuyangcao · Answer 2 · Tue Nov 19 2024 10:17:20 GMT+0800 (China Standard Time)

Anyone tried audio-driven implicit keypoint prediction? i've tried it based on Faceformer which is mentioned in appendix C of the paper. but i got failed, the loss (mse between predicted exp and gt exp) reduced tremendously at the beginning, but the result of keypoint sequence is wired, the movement is almost disappered. so maybe someone give some advice?

Based on the idea of VASA-1, we use LivePortrait to implement an audio-driven portrait animation method. For details, please refer to: https://jdh-algo.github.io/JoyVASA/.