Audio-driven implicit keypoint prediction
laxyon opened this issue · comments
Anyone tried audio-driven implicit keypoint prediction? i've tried it based on Faceformer which is mentioned in appendix C of the paper. but i got failed, the loss (mse between predicted exp and gt exp) reduced tremendously at the beginning, but the result of keypoint sequence is wired, the movement is almost disappered. so maybe someone give some advice?
I have encountered the same problem. Does anyone know how to solve it?
Anyone tried audio-driven implicit keypoint prediction? i've tried it based on Faceformer which is mentioned in appendix C of the paper. but i got failed, the loss (mse between predicted exp and gt exp) reduced tremendously at the beginning, but the result of keypoint sequence is wired, the movement is almost disappered. so maybe someone give some advice?
Based on the idea of VASA-1, we use LivePortrait to implement an audio-driven portrait animation method. For details, please refer to: https://jdh-algo.github.io/JoyVASA/.