Face in output not matching with the original face

Question

Face in output not matching with the original face

vardanagarwal opened this issue 2 years ago · comments

First of all the lip sync is the best I have seen. However, the face is a little distorted and doesn't match the input face. Is there any settings I can edit in the inference_for_demo_video.py for more accurate face matching? I tried playing with the number of steps of the diffusion model, even setting it up to 500, but didn't notice any difference.

Yifeng · Answer 1 · Fri Jan 05 2024 20:06:00 GMT+0800 (China Standard Time)

Thanks for your attention.
In our experiments, we found that the face shape of the speaker in the generated results becomes similar (distort) to the face shape of the speaker in the style reference video (style clip). (If you use style clips of different people, you might observe differences in face shapes). This is a limitation of our method. The solution is to choose a style clip with a face shape similar to that of the speaker.

Vardan Agarwal · Answer 2 · Fri Jan 05 2024 20:59:21 GMT+0800 (China Standard Time)

@YifengMa9 thanks for the quick reply.
Is there any way you would recommend creating the style video with the face I am using as the input image. I am fine with a neutral expression. Let me know if you have any pointers on that. Otherwise, I am going to try some experiments and will post the results here.

Yifeng · Answer 3 · Fri Jan 05 2024 22:53:06 GMT+0800 (China Standard Time)

Finding a suitable style clip and extracting features might be time-consuming and labor-intensive (requiring running 3DMM extraction code). One simple method I can think of is to set the --cfg_scale to a smaller number, such as 0 or 0.1. This will reduce the effect of the style clip, which you might want to try.

Vardan Agarwal · Answer 4 · Sat Jan 06 2024 01:03:09 GMT+0800 (China Standard Time)

Thanks for your help. The --cfg_scale does help but not to the extent I wanted. I'll close this issue for now, as you said that this currently I limitation of your method.