Face in output not matching with the original face
vardanagarwal opened this issue · comments
First of all the lip sync is the best I have seen. However, the face is a little distorted and doesn't match the input face. Is there any settings I can edit in the inference_for_demo_video.py for more accurate face matching? I tried playing with the number of steps of the diffusion model, even setting it up to 500, but didn't notice any difference.
Thanks for your attention.
In our experiments, we found that the face shape of the speaker in the generated results becomes similar (distort) to the face shape of the speaker in the style reference video (style clip). (If you use style clips of different people, you might observe differences in face shapes). This is a limitation of our method. The solution is to choose a style clip with a face shape similar to that of the speaker.
@YifengMa9 thanks for the quick reply.
Is there any way you would recommend creating the style video with the face I am using as the input image. I am fine with a neutral expression. Let me know if you have any pointers on that. Otherwise, I am going to try some experiments and will post the results here.
Finding a suitable style clip and extracting features might be time-consuming and labor-intensive (requiring running 3DMM extraction code). One simple method I can think of is to set the --cfg_scale
to a smaller number, such as 0 or 0.1. This will reduce the effect of the style clip, which you might want to try.
Thanks for your help. The --cfg_scale
does help but not to the extent I wanted. I'll close this issue for now, as you said that this currently I limitation of your method.