Questions about .mat file when preparing training dataset

Question

Questions about .mat file when preparing training dataset

DaddyJin opened this issue 3 years ago · comments

Thank you for the exciting work.
When preparing the training dataset, I have questions about the .mat file.
The prepare_vox_lmdb.py file indicates that .mat file should contain keys of 'landmark', 'coeff' and 'transform_params', as shown below.

However, in Deep3DFaceRecon_pytorch, the keys in .mat file is the predicted 257-dimensional coefficients and 68 projected 2d facial landmarks.

I would like to know the correspondence between those two kind of 'keys' or exactly how you extract the .mat file corresponding to a mp4 file.

Thanks in advance!

Ren Yurui · Answer 1 · Thu Sep 30 2021 16:39:56 GMT+0800 (China Standard Time)

Hi,
We use the tensorflow implementation of DeepFaceRecon to extract the coefficients of the videos.
The images are cropped before sending to DeepFaceRecon. Therefore, we use both the crop parameters and the 3DMM parameters to describe the target motions.
The 257-dimensional coefficients (file_mat['coeff']) along with the 5-dimensional crop parameters (file_mat['transform_params']) are extracted.

# the format of the coeff
def split_3dmm_coeff(file_mat):
        coeff_3dmm = file_mat['coeff']  # N*257
        id_coeff = coeff_3dmm[:,:80] #identity
        ex_coeff = coeff_3dmm[:,80:144] #expression
        tex_coeff = coeff_3dmm[:,144:224] #texture
        angles = coeff_3dmm[:,224:227] #euler angles for pose
        gamma = coeff_3dmm[:,227:254] #lighting
        translation = coeff_3dmm[:,254:257] #translation
        return 

# the format of the crop parameter
def split_crop_parameter(file_mat):
       crop_param = file_mat['transform_params']
       org_image_size_w, org_image_size_h, resize_ratio, trans_0, trans_1 = np.hsplit(crop_param.astype(np.float32), 5)
       return

For the transform_params is obtianed by this function.
Good Luck!
Yurui

Kangyeol Kim · Answer 2 · Tue Oct 05 2021 08:49:19 GMT+0800 (China Standard Time)

Hi,
I face a similar situation as @DaddyJin , I also used pytorch version of Deep3DFaceRecon for dataset preparation. My question is "Is there significant performance difference if I do not put transform_params as inputs?"

Thank you!

Ren Yurui · Answer 3 · Tue Oct 05 2021 09:49:00 GMT+0800 (China Standard Time)

Yes, the transform_params is very improtant.
The network needs to calculate the absolute location of the faces according to the transform_params.

Ren Yurui · Answer 4 · Sat Oct 09 2021 14:10:43 GMT+0800 (China Standard Time)

Hi all,
We provide the 3dmm extraction scripts using DeepFaceRecon_pytorch.
Please check DatasetHelper for more details.
Let me know if you have any problems.
Yurui

Weize Quan · Answer 5 · Mon Sep 12 2022 13:00:29 GMT+0800 (China Standard Time)

Hi,
"The images are cropped before sending to DeepFaceRecon." Which size is used to crop face image from original full-sized video? 224x224 or 256x256?

In addtion, extract_kp_videos.py doesn't change the size of input face image. However, the deepfacerecon_pytorch used by face _recon_videos.py will resize input image to 224x224. In prepare_vox_lmdb.py, the image is resized to 256x256.

To summarize, if we follow the data process in your DatasetHelper , is there any difference for input face video with 224x224 and 256x256? Which size is used in your pipeline.