Weizhi-Zhong / IP_LAP

CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

large content landmarks

KingStorm opened this issue · comments

Nice work!

trying to train IP_LAP with custom data, but got results: content landmark is generally larger than the pose landmark. => therefore there is a mismatch. However, if I use pretrained model , the size of resulting content landmark is correct.

training dataset: 5 min 480 x 640 video

截屏2023-06-02 21 23 37

Hi, thanks for your interest.
Does the eval_L1_loss decrease to 6e-3? as described in this issue.
If not, what is the final eval_L1_loss of your training with custom data?
Since your training dataset includes 5 min 480 x 640 video, I doubt whether it is enough.

Hi, thanks for your interest. Does the eval_L1_loss decrease to 6e-3? as described in this issue. If not, what is the final eval_L1_loss of your training with custom data? Since your training dataset includes 5 min 480 x 640 video, I doubt whether it is enough.

Or, Is your traning overfitting? Compare the running loss and eval loss.

Hi, thanks for your interest. Does the eval_L1_loss decrease to 6e-3? as described in this issue. If not, what is the final eval_L1_loss of your training with custom data? Since your training dataset includes 5 min 480 x 640 video, I doubt whether it is enough.

Thanks for your reply. The eval L1_loss does decrease to 1e-3 level. I would consider it is overfitting enough. And I test it on the training data.

I have drawn sketch during training of landmark, it is reasonable:
{epoch}_{step}_pred_sketch

Howver in inference, the sketch turns out to be mismatched between Pose and Content landmarks:
temp

I have drawn sketch during training of landmark

Hi, thanks for your interest.
Does it mean that you draw sketches during training on the training dataset, and draw sketches during inference on the testing dataset?

Hi, seems find out some kind of clue about the size mismatch.

Found the landmarks extracted from preprocess_video.py is just fitting the 128x128 image with no left space, Like
2358

However, the landmarks extracted from inference_single.py have some space left in the 128x128 image, Like:
0_0_pred_sketch_not_replace_Nl_5k

Hi, thanks for your interest.
As shown in the following code:
https://github.com/Weizhi-Zhong/IP_LAP/blob/e5d8fdc1ab01a1426ac4c8cfec461ec5d024050d/preprocess/preprocess_video.py#LL251C19-L251C19
While preprocessing the LRS2 dataset, we plus 5 extra pixels to the marginal so that the normalized coordinate of most bottom landmarks is not always 1.
Similarly, in the inference:
https://github.com/Weizhi-Zhong/IP_LAP/blob/e5d8fdc1ab01a1426ac4c8cfec461ec5d024050d/inference_single.py#LL282C9-L282C9
we plus some(25) pixels, so the landmarks are within the cropping region.
Depending on your dataset and input videos, you can change the number of pixels added to the marginal region such that all landmarks are within the cropping region.

Hope this can be helpful for you.

thanks, fair enough.