Reproduce the Transformation Matrices

Question

Reproduce the Transformation Matrices

JeremyCJM opened this issue 2 years ago · comments

Hi Guy,

Thanks for the excellent work! I am using my own face tracker to reproduce the transformation matrices in your provided JSON files. However, I cannot get the same results for both rotation and translation. Moreover, when I substitute the transformation matrices with mine in your JSON files, the validation result is bad. But the weird thing is that the debug results (overlaying transformed face mask on the original image) of my transformation matrix seem good.

I am wondering if you directly utilize the head rotation and translation in the face tracker as the camera-to-world matrix in NeRF. If so, what is the unit of your rotation and translation? Does it matter if my transformation matrix is in a different world coordinate system from yours (e.g. different origins)? Could you provide your debug code that overlays the face mask on the original image so that I could check what is wrong with my own transformation matrices?

Looking forward to hearing from you!

Thanks,
Jeremy

Jianfeng Wang · Answer 1 · Mon May 09 2022 22:02:59 GMT+0800 (China Standard Time)

Hi, could you please give more details how you utilize the head rotation and translation from the face tracker ? I also get stuck here. Thanks.

Guy Gafni · Answer 2 · Tue May 10 2022 18:07:09 GMT+0800 (China Standard Time)

@JeremyCJM , the fact that you can render an overlay properly but it doesn't work with nerf suggests it's not compatible with the coordinate system nerf uses.
I indeed use the rigid transform of the head as the matrix for the camera in nerf. The rotations don't have units (it's just an orthogonal matrix), and then the scene is normalized such that the head is at an average z of 0.5 from the camera. (For positional encoding, the scene should be within [-1,1).
To debug it you can use the file real_to_nerf.py, there I have the code to render my masks' overlay. (render_debug_matrix())

JeremyCJM · Answer 3 · Wed May 11 2022 11:13:11 GMT+0800 (China Standard Time)

Thanks a lot for your reply! I have some follow-up questions:

Could I ask which 3DMM model you used? BFM 2009 or 2017 or 2019?
Did you do any scale (mainly translation), inverse or axis permutation for the head rigid transformation matrix before using it as the camera extrinsic matrix?
"the scene is normalized such that the head is at an average z of 0.5 from the camera". Is this sentence mean that I need to scale the [x, y, z] coordinate of my 3DMM model (or 3D landmarks) by the same magnitude, so that the average of z values of 3D head points is 0.5? Can I directly scale the translation vector?
What are the uses of those functions in real_to_nerf.py: custom_seq_presentation_v2(), custom_seq_teaser(), custom_seq_driving()?
What is render_poses for in load_flame.py?
What is the camera_angle_x in JSON files? Is it the Euler rotation angle along x-axis? If not, how to compute it?

JeremyCJM · Answer 4 · Fri May 13 2022 12:31:34 GMT+0800 (China Standard Time)

Hi Guy, I found one question might be critical: why do the intrinsics in the JSON files have the negative fx, i.e., [-fx, fy, cx, cy]?

Guy Gafni · Answer 5 · Mon May 16 2022 19:33:31 GMT+0800 (China Standard Time)

It's a 3DMM tweaked off one of the Basel models, it's part of Face2Face (Thies et al.) which is not open sourced.
yes, scale to normalize the avg z such that the head is on average 0.5 units from the camera. The tracker assumes a different coordinate system, so I have to take the rigid transformations it outputs and rotate 180deg around the y-axis and then flip the image horizontally (that's why there is a minus on the fx). This is in read_rigid_poses() and in read_intrinsics()
yes, and yes. (I calculate the average z coordinate of all rigid transformations, and then divide by (2xthat) ).
they're all just hardcoded sequences of smooth head poses / camera poses, to generate jsons for things like a loop around the person, the teaser image, or a driving actor drive a different avatar using their own poses and expressions (you see that in the paper's video)
the poses that will be used as camera poses in nerf. You can see that it will be used eg in the eval script.
It used to be the camera FOV, as the original nerf repo did not support different focal lengths for the X and Y axes. I modified it to take intrinsics instead of fov. It is calculated as follows: camera_angle = 2 * np.arctan(im_size[0] / (2 * (intrinsics[0] )))

JeremyCJM · Answer 6 · Mon May 16 2022 19:41:17 GMT+0800 (China Standard Time)

Thanks a lot, Guy! Thanks for your detailed answers. I finally find that the bug lies at the negative fx. My tracker works well with positive fx. Another 'bug' is that your data loader loads the intrinsics of test data by default (just mention this here for the potential help on others).

Guy Gafni · Answer 7 · Mon May 16 2022 20:01:41 GMT+0800 (China Standard Time)

Glad it works now!

Thanks for pointing it out. I later rewrote a data loader that doesn't load everything to RAM to be able to train longer sequences anyhow :)

JeremyCJM · Answer 8 · Mon May 16 2022 20:22:41 GMT+0800 (China Standard Time)

Cool! That would be great. Thanks!

JeremyCJM · Answer 9 · Mon May 23 2022 14:07:31 GMT+0800 (China Standard Time)

Hi Guy, I am wondering if you did any temporal smoothing for the generated transformation matrices during or after the face tracking? I have severe jittering on my results.

Guy Gafni · Answer 10 · Tue May 24 2022 14:58:39 GMT+0800 (China Standard Time)

Hey, Nope, it was temporally consistent enough for us Am 23.05.2022 09:07 schrieb JeremyCJM ***@***.***>: Hi Guy, I am wondering if you did any smoothing for the generated transformation matrix during or after the face tracking? — Reply to this email directly, view it on GitHub<#34 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AI7AQ5HQHDJWJIANQV7RWBLVLMODDANCNFSM5VNOFZRQ>. You are receiving this because you commented.Message ID: ***@***.***>

JeremyCJM · Answer 11 · Wed May 25 2022 12:10:05 GMT+0800 (China Standard Time)

What about the expressions? Are they temporal consistent enough that you did not do any smoothing?

SunshineAtNoon · Answer 12 · Thu May 26 2022 07:35:13 GMT+0800 (China Standard Time)

Hi, @JeremyCJM and @gafniguy I'm also interested in using NeRFace with my own video. Could you please point me to the tracker that you used and what changes need to be made to the transformation matrix in order to use it in NeRFace?

I am currently using the tracker from AD-NeRF, it seems the rotation matrix is correct, but the intrinsic is wrong. Please see the original frame, the rendered frame by render_debug_camera_matrix and the synthesized frame by a pre-trained NeRFace model below:

Thanks so much for your time!

Guy Gafni · Answer 13 · Thu May 26 2022 15:53:32 GMT+0800 (China Standard Time)

@JeremyCJM nope, no smoothing on expressions either

@sunshineatnoon yeah, the rotation looks fine, but we are seeing artifacts coming from the positional encoding. This suggests your camera frustum is viewing areas outside of (-1,1), therefore periodically repeating. Either your focal length is wrong, or your model/camera poses are not normalized properly (z of the translation should be around 0.5).

SunshineAtNoon · Answer 14 · Tue May 31 2022 01:25:54 GMT+0800 (China Standard Time)

Hi, @gafniguy Thanks for your response. Could you please tell me how to correctly call the render_debug_camera_matrix function? I tried to use render_debug_camera_matrix(poses[0], hwf[-1]) using the cameras provided by NeRFace, but the rendering is not correct. Does the pose and intrinsics need to be processed before rendering?