Question about the evaluation of 3D face reconstruction

Question

Question about the evaluation of 3D face reconstruction

deepmo24 opened this issue 4 years ago · comments

Hi, Thanks for your great work and open source code!
Sorry to disturb you. I have trouble with how to preprocess the 'Florence' dataset and calculate the point-to-plane RMSE. I have made some effort but still do not deal with this. I would be very grateful if you can share this part of code with me. Looking forward to hearing from you. Thank you!

jiaxiangshang · Answer 1 · Fri Oct 09 2020 18:37:14 GMT+0800 (China Standard Time)

hi, langyuan
This part code comes from my previous colleague, I can not release this.
We clear the preprocess of the pipeline of preprocessing in the paper.

sign the 5 landmarks on micc mesh
crop the face region(detail in the paper)
3.rigid align by 5 3D landmarks
4.icp
5.point-to-plane rsme
All steps above have public code/lib.
Wish this can help you.

Langyuan Mo · Answer 2 · Tue Oct 13 2020 20:18:09 GMT+0800 (China Standard Time)

Hi, jiaxiang
Thank you very much for your helpful suggestions. I still have some questions below:

each subject of MICC has four meshes (frontal1, frontal2, sidel, sider), which mesh shall we choose?
given the cropped mesh of a specific object from MICC, which image or images shall we use to reconstruct the 3d face mesh, so that we can do the icp between them?
given two point sets A and B, whether the 5 landmarks are used to calculate the scale factor? after scaling them to the same level, then we run the icp on these two point sets to get rigid transformation R,t. Is that so?

Looking forward to hearing from you.

jiaxiangshang · Answer 3 · Tue Oct 13 2020 20:50:49 GMT+0800 (China Standard Time)

Yes!
This is of course a problem!!! When the first time I process this, it is also a trouble for me.

I pick the (no emotion/neutral) mesh by myself from (frontal1, frontal2). And too be honest, I do this two time and this won't highly affect the final result.
2.This you can see in our paper, we have two methods,
(1) using the MICC video frame
(2) or render images.
yes, 5 landmarks used for a similarity transform, scale is important.

In a word, process the micc dataset by ourselves.

Langyuan Mo · Answer 4 · Wed Oct 14 2020 11:35:16 GMT+0800 (China Standard Time)

Hi, jiaxiang
thank you for your helpful reply, I have dealed with the icp registration! I still have some questions below:

about the two methods to use MICC:

(1) video frames: does the average shape for each video mean we average the \alpha and \beta coefficients for all frames and use the average \alpha and \beta to reconstrcut the final average shape S?
(2) rendered images: can you tell me the pipeline to get rendered images?
what is the point-to-plane RMSE formula?

Looking forward to hearing from you.

jiaxiangshang · Answer 5 · Wed Oct 14 2020 12:40:35 GMT+0800 (China Standard Time)

(1) average mesh
(2) emm, you can see the paper that how we define the camera poses, the render pipeline we using the tf_mesh_render
2, This is the common definition of the point-to-plane RMSE, it has kinds of public lib/code do this

Langyuan Mo · Answer 6 · Wed Oct 14 2020 15:52:00 GMT+0800 (China Standard Time)

Hi, I still have confusions on the video frame case. by "average mesh", do you mean we need to get the reconstructed face mesh for each frame of a video, then run the icp with micc mesh for all reconstructed face meshes, and finally average the registered reconstructed meshes?

jiaxiangshang · Answer 7 · Wed Oct 14 2020 16:29:42 GMT+0800 (China Standard Time)

sample the video
test on each video
3, average the test result(mesh)
4, do eval
This pipeline will solve the time problem

Langyuan Mo · Answer 8 · Wed Oct 14 2020 16:43:41 GMT+0800 (China Standard Time)

aha, thank's for your sugguestion.
I have another question about the NME evaluation, the AFLW2000-3D doesn't provide the bounding box annotation, how do you get the bounding box?

jiaxiangshang · Answer 9 · Wed Oct 14 2020 18:52:26 GMT+0800 (China Standard Time)

they give the landmark gt, we bound the bbox by this gt landmark.

Langyuan Mo · Answer 10 · Wed Oct 14 2020 19:23:20 GMT+0800 (China Standard Time)

what's the strategy you use to bound the bbox by gt landmarks? I want to use the same stratgy to keep a fair comparison.

jiaxiangshang · Answer 11 · Wed Oct 14 2020 19:51:15 GMT+0800 (China Standard Time)

Oh, sorry, I do not make this clear, this strategy is to find the most fix box, it means the min(landmark_x/y) and max(landmark_x/y).
As this is a common setting in face alignment, so I did not give detail in the paper.

Langyuan Mo · Answer 12 · Wed Oct 14 2020 20:02:15 GMT+0800 (China Standard Time)

I get it. thank you!

haoqiang · Answer 13 · Wed Oct 21 2020 16:27:44 GMT+0800 (China Standard Time)

Hi, jiaxiangshang,
Great job! I have some questions about evaluation on MICC :

As you say '5 landmarks used for a similarity transform', do you mean calculate the transform (scale only) between groudtruth and prediction, and how to calculate the scale according to 5 landmarks? The distance between 2 eyes or ...
Do I need to label 5 landmarks in all MICC mesh first for crop and align.
when testing render images, should I rotate the MICC 3D face to look at the camera first, then render face images with 20 poses.

THX!

jiaxiangshang · Answer 14 · Wed Oct 21 2020 17:01:32 GMT+0800 (China Standard Time)

1, the similarity transform contains scale term, the transform is sure betweenn gt and predict face
2. yes
3, the render pipeline is not strict or std, the rendered images make sense is ok

haoqiang · Answer 15 · Wed Oct 21 2020 17:35:07 GMT+0800 (China Standard Time)

thanks for your fast reply, and I still have some questions:

How can I calculate similarity transform matrix between two 3D point sets?
Should I crop the predicted mesh at a radius of 95mm around the nose after align (align the predicted mesh to gt?)
Due to the difference vertex number between gt and the predicted, I need use point-to-plane ICP rather than point-to-point ICP, right?

jiaxiangshang · Answer 16 · Wed Oct 21 2020 18:20:31 GMT+0800 (China Standard Time)

similarity transform is a common setting, that public code can be found
is clear as our paper.
yes

haoqiang · Answer 17 · Thu Oct 22 2020 19:02:30 GMT+0800 (China Standard Time)

Sorry to disturb you again.
I evaluate by this stratgy:

Label 5 lmks on MICC data.
Crop the gt mesh at a radius of 95mm around the nose.
Align predicted mesh by similarity transform (don't change gt).
Do point(gt)-to-plane(predicted) ICP by open3d lib and caluate the RMSE of each point of gt to plane of predicted mesh. The yellow one is pred and the blue is gt.

Is there any mistakes heres？and should I remove the forehead of gt in step2?

Thanks for your guidance.