Error in loading the pretrained model

Question

Error in loading the pretrained model

zhanglonghao1992 opened this issue a year ago · comments

When I run the testing script by:
python -m torch.distributed.launch --nproc_per_node=1 test.py --data_dir /path/to/mini_dataset/m--20180227--0000--6795937--GHS --krt_dir /path/to/mini_dataset/m--20180227--0000--6795937--GHS/KRT --framelist_test /path/to/mini_dataset/m--20180227--0000--6795937--GHS/frame_list.txt --test_segment "./mini_test_segment.json"

I got the error:
RuntimeError: Error(s) in loading state_dict for DeepAppearanceVAE: size mismatch for cc.weight: copying a param with shape torch.Size([75, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([37, 3, 1, 1 ]). size mismatch for cc.bias: copying a param with shape torch.Size([75, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([37, 3, 1, 1]).

It seems like you used 76 cams for training.

doudou · Answer 1 · Fri Feb 24 2023 03:16:36 GMT+0800 (China Standard Time)

Hi,

Sorry for the late reply. I tried

python -m torch.distributed.launch --nproc_per_node=1 test.py --data_dir dataset/m--20180227--0000--6795937--GHS --krt_dir dataset/m--20180227--0000--6795937--GHS/KRT --framelist_test dataset/m--20180227--0000--6795937--GHS/frame_list.txt --test_segment ./mini_test_segment.json --model_path pretrained_model/6795937_model.pth

and the model was correctly loaded. The link to the model can be found at pretrained_model/index.html. Could you provide the model name you loaded as well?

Dafei Qin · Answer 2 · Wed Mar 01 2023 09:48:09 GMT+0800 (China Standard Time)

I encountered the same issue by loading the pretrained model from the mini pretrained model you provided in the INSTALLATION.md

doudou · Answer 3 · Wed Mar 01 2023 23:50:26 GMT+0800 (China Standard Time)

I encountered the same issue by loading the pretrained model from the mini pretrained model you provided in the INSTALLATION.md

Hi, could you try the full model for now using the command previously mentioned? It's the same size as the mini model and it appears to work fine.
@cwuu Could you check if the cameras used for training the mini model are correct?

Thanks

Dafei Qin · Answer 4 · Thu Mar 02 2023 13:30:44 GMT+0800 (China Standard Time)

Yes I can load the full model. However the output textures look weird, like these pred_tex.

The result.txt is:

Best screen loss 0.000000, best tex loss 0.070132,  best vert loss 0.002344, screen loss 0.000000, tex loss 0.070132, vert_loss 0.002344

doudou · Answer 5 · Thu Mar 02 2023 23:37:00 GMT+0800 (China Standard Time)

Yes I can load the full model. However the output textures look weird, like these pred_tex.

The result.txt is:
Best screen loss 0.000000, best tex loss 0.070132,  best vert loss 0.002344, screen loss 0.000000, tex loss 0.070132, vert_loss 0.002344

Hi, this is because the model is conditioned on viewpoint. If the face part is occluded during training there's no supervision on that texture area, leading to these bright-light artifacts, which is expected.

Revan Ji · Answer 6 · Tue Jul 04 2023 05:37:06 GMT+0800 (China Standard Time)

Hi there! I was running the pretrained models and had something similar. Specifically, these model dimension mismatches happen for identities 002643814, 7889059, 5372021, 2183941, and 002914589. I hand wrote all the camera configs to include all cameras that exists in their respective folders, but for these entities, the pretrained models seems to ask for a large chunk of extra cameras that the dataset didn't provide. (e.g. given 40 cameras but dataset requires 76, etc). Identities 6795937 and 8870559 were able to run correctly. Was I accidentally using a wrong network architecture? Anything I should check?

Thanks

Cheng-hsin Emily WUU · Answer 7 · Wed Aug 02 2023 00:29:28 GMT+0800 (China Standard Time)

Hi @revanj ,

We just update the codebase to include different camera configs for each identities and their pretrained-models with different architectures (w/o screen loss). Please let me know if it still doesn't work for your case. Thanks.

Avinab Saha · Answer 8 · Sat Aug 12 2023 00:28:32 GMT+0800 (China Standard Time)

Hi @cwuu , the mini model checkpoint is still having the size mismatch issue.