jiaxiangshang / MGCNet

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency[ECCV 2020]

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the evaluation of 3D face reconstruction

deepmo24 opened this issue · comments

Hi, Thanks for your great work and open source code!
Sorry to disturb you. I have trouble with how to preprocess the 'Florence' dataset and calculate the point-to-plane RMSE. I have made some effort but still do not deal with this. I would be very grateful if you can share this part of code with me. Looking forward to hearing from you. Thank you!

hi, langyuan
This part code comes from my previous colleague, I can not release this.
We clear the preprocess of the pipeline of preprocessing in the paper.

  1. sign the 5 landmarks on micc mesh
  2. crop the face region(detail in the paper)
    3.rigid align by 5 3D landmarks
    4.icp
    5.point-to-plane rsme
    All steps above have public code/lib.
    Wish this can help you.

Hi, jiaxiang
Thank you very much for your helpful suggestions. I still have some questions below:

  1. each subject of MICC has four meshes (frontal1, frontal2, sidel, sider), which mesh shall we choose?
  2. given the cropped mesh of a specific object from MICC, which image or images shall we use to reconstruct the 3d face mesh, so that we can do the icp between them?
  3. given two point sets A and B, whether the 5 landmarks are used to calculate the scale factor? after scaling them to the same level, then we run the icp on these two point sets to get rigid transformation R,t. Is that so?

Looking forward to hearing from you.

Yes!
This is of course a problem!!! When the first time I process this, it is also a trouble for me.

  1. I pick the (no emotion/neutral) mesh by myself from (frontal1, frontal2). And too be honest, I do this two time and this won't highly affect the final result.
    2.This you can see in our paper, we have two methods,
    (1) using the MICC video frame
    (2) or render images.
  2. yes, 5 landmarks used for a similarity transform, scale is important.

In a word, process the micc dataset by ourselves.

Hi, jiaxiang
thank you for your helpful reply, I have dealed with the icp registration! I still have some questions below:

  1. about the two methods to use MICC:
    image
    (1) video frames: does the average shape for each video mean we average the \alpha and \beta coefficients for all frames and use the average \alpha and \beta to reconstrcut the final average shape S?
    (2) rendered images: can you tell me the pipeline to get rendered images?

  2. what is the point-to-plane RMSE formula?

Looking forward to hearing from you.

(1) average mesh
(2) emm, you can see the paper that how we define the camera poses, the render pipeline we using the tf_mesh_render
2, This is the common definition of the point-to-plane RMSE, it has kinds of public lib/code do this

Hi, I still have confusions on the video frame case. by "average mesh", do you mean we need to get the reconstructed face mesh for each frame of a video, then run the icp with micc mesh for all reconstructed face meshes, and finally average the registered reconstructed meshes?

  1. sample the video
  2. test on each video
    3, average the test result(mesh)
    4, do eval
    This pipeline will solve the time problem

aha, thank's for your sugguestion.
I have another question about the NME evaluation, the AFLW2000-3D doesn't provide the bounding box annotation, how do you get the bounding box?

they give the landmark gt, we bound the bbox by this gt landmark.

what's the strategy you use to bound the bbox by gt landmarks? I want to use the same stratgy to keep a fair comparison.

Oh, sorry, I do not make this clear, this strategy is to find the most fix box, it means the min(landmark_x/y) and max(landmark_x/y).
As this is a common setting in face alignment, so I did not give detail in the paper.

I get it. thank you!

Hi, jiaxiangshang,
Great job! I have some questions about evaluation on MICC :

  1. As you say '5 landmarks used for a similarity transform', do you mean calculate the transform (scale only) between groudtruth and prediction, and how to calculate the scale according to 5 landmarks? The distance between 2 eyes or ...
  2. Do I need to label 5 landmarks in all MICC mesh first for crop and align.
  3. when testing render images, should I rotate the MICC 3D face to look at the camera first, then render face images with 20 poses.

THX!

1, the similarity transform contains scale term, the transform is sure betweenn gt and predict face
2. yes
3, the render pipeline is not strict or std, the rendered images make sense is ok

thanks for your fast reply, and I still have some questions:

  1. How can I calculate similarity transform matrix between two 3D point sets?
  2. Should I crop the predicted mesh at a radius of 95mm around the nose after align (align the predicted mesh to gt?)
  3. Due to the difference vertex number between gt and the predicted, I need use point-to-plane ICP rather than point-to-point ICP, right?
  1. similarity transform is a common setting, that public code can be found
  2. is clear as our paper.
  3. yes

Sorry to disturb you again.
I evaluate by this stratgy:

  1. Label 5 lmks on MICC data.
  2. Crop the gt mesh at a radius of 95mm around the nose.
  3. Align predicted mesh by similarity transform (don't change gt).
  4. Do point(gt)-to-plane(predicted) ICP by open3d lib and caluate the RMSE of each point of gt to plane of predicted mesh. The yellow one is pred and the blue is gt.
    Screenshot from 2020-10-22 18-58-45

Is there any mistakes heres?and should I remove the forehead of gt in step2?

Thanks for your guidance.