SizheAn / PanoHead

Hi, thank you for the brilliant work!

May I just ask a quick question -- are the masks actually used during the PTI inversion? The code in projector_withseg.py seems to only read the image directly from the given path, without reading/using the provided masks at all, is this expected?

If so, may I ask if it is actually possible to utilize a mask during the inversion? At the moment, it seems that inversion would fail pretty badly if the image contains a large area of hair.

Thank you in advance!

Yeah I think if you just comment line

PanoHead/projector_withseg.py

Line 320 in 02073a4

    
           dataset_kwargs = dnnlib.EasyDict(class_name='training.dataset.MaskLabeledDataset', img_path=target_img, seg_path=target_seg, use_labels=True, max_size=None, xflip=False)

and uncomment line

PanoHead/projector_withseg.py

Line 319 in 02073a4

    
           # dataset_kwargs = dnnlib.EasyDict(class_name='training.dataset.ImageFolderDataset', path=target_fname, use_labels=True, max_size=None, xflip=False)

, change it to

dataset_kwargs = dnnlib.EasyDict(class_name='training.dataset.ImageFolderDataset', path=target_img, use_labels=True, max_size=None, xflip=False)

, the pti should still work.

Using the mask won't solve this problem ultimately, IMO. Inversion itself does not fail on finding the closest latent as you can see almost all the reconstruct images for frontal faces are high quality still. The problem occurs when we change the camera pose to side/back, which means the pretrained model's learned 3D prior is not good/generalizable enough. This is just my opinion, feel free to do something with masks and let us know! :)

Thx a lot! Yeah it makes sense

Is Mask Actually Used in Inversion?