gafniguy / 4D-Facial-Avatars

Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reproduce the Transformation Matrices

JeremyCJM opened this issue · comments

Hi Guy,

Thanks for the excellent work! I am using my own face tracker to reproduce the transformation matrices in your provided JSON files. However, I cannot get the same results for both rotation and translation. Moreover, when I substitute the transformation matrices with mine in your JSON files, the validation result is bad. But the weird thing is that the debug results (overlaying transformed face mask on the original image) of my transformation matrix seem good.

I am wondering if you directly utilize the head rotation and translation in the face tracker as the camera-to-world matrix in NeRF. If so, what is the unit of your rotation and translation? Does it matter if my transformation matrix is in a different world coordinate system from yours (e.g. different origins)? Could you provide your debug code that overlays the face mask on the original image so that I could check what is wrong with my own transformation matrices?

Looking forward to hearing from you!

Thanks,
Jeremy

Hi, could you please give more details how you utilize the head rotation and translation from the face tracker ? I also get stuck here. Thanks.

@JeremyCJM , the fact that you can render an overlay properly but it doesn't work with nerf suggests it's not compatible with the coordinate system nerf uses.
I indeed use the rigid transform of the head as the matrix for the camera in nerf. The rotations don't have units (it's just an orthogonal matrix), and then the scene is normalized such that the head is at an average z of 0.5 from the camera. (For positional encoding, the scene should be within [-1,1).
To debug it you can use the file real_to_nerf.py, there I have the code to render my masks' overlay. (render_debug_matrix())

Thanks a lot for your reply! I have some follow-up questions:

  1. Could I ask which 3DMM model you used? BFM 2009 or 2017 or 2019?
  2. Did you do any scale (mainly translation), inverse or axis permutation for the head rigid transformation matrix before using it as the camera extrinsic matrix?
  3. "the scene is normalized such that the head is at an average z of 0.5 from the camera". Is this sentence mean that I need to scale the [x, y, z] coordinate of my 3DMM model (or 3D landmarks) by the same magnitude, so that the average of z values of 3D head points is 0.5? Can I directly scale the translation vector?
  4. What are the uses of those functions in real_to_nerf.py: custom_seq_presentation_v2(), custom_seq_teaser(), custom_seq_driving()?
  5. What is render_poses for in load_flame.py?
  6. What is the camera_angle_x in JSON files? Is it the Euler rotation angle along x-axis? If not, how to compute it?

Hi Guy, I found one question might be critical: why do the intrinsics in the JSON files have the negative fx, i.e., [-fx, fy, cx, cy]?

  1. It's a 3DMM tweaked off one of the Basel models, it's part of Face2Face (Thies et al.) which is not open sourced.
  2. yes, scale to normalize the avg z such that the head is on average 0.5 units from the camera. The tracker assumes a different coordinate system, so I have to take the rigid transformations it outputs and rotate 180deg around the y-axis and then flip the image horizontally (that's why there is a minus on the fx). This is in read_rigid_poses() and in read_intrinsics()
  3. yes, and yes. (I calculate the average z coordinate of all rigid transformations, and then divide by (2xthat) ).
  4. they're all just hardcoded sequences of smooth head poses / camera poses, to generate jsons for things like a loop around the person, the teaser image, or a driving actor drive a different avatar using their own poses and expressions (you see that in the paper's video)
  5. the poses that will be used as camera poses in nerf. You can see that it will be used eg in the eval script.
  6. It used to be the camera FOV, as the original nerf repo did not support different focal lengths for the X and Y axes. I modified it to take intrinsics instead of fov. It is calculated as follows: camera_angle = 2 * np.arctan(im_size[0] / (2 * (intrinsics[0] )))

Thanks a lot, Guy! Thanks for your detailed answers. I finally find that the bug lies at the negative fx. My tracker works well with positive fx. Another 'bug' is that your data loader loads the intrinsics of test data by default (just mention this here for the potential help on others).

Glad it works now!

Thanks for pointing it out. I later rewrote a data loader that doesn't load everything to RAM to be able to train longer sequences anyhow :)

Cool! That would be great. Thanks!

Hi Guy, I am wondering if you did any temporal smoothing for the generated transformation matrices during or after the face tracking? I have severe jittering on my results.

What about the expressions? Are they temporal consistent enough that you did not do any smoothing?

Hi, @JeremyCJM and @gafniguy I'm also interested in using NeRFace with my own video. Could you please point me to the tracker that you used and what changes need to be made to the transformation matrix in order to use it in NeRFace?

I am currently using the tracker from AD-NeRF, it seems the rotation matrix is correct, but the intrinsic is wrong. Please see the original frame, the rendered frame by render_debug_camera_matrix and the synthesized frame by a pre-trained NeRFace model below:

f_0355
render
rgb

Thanks so much for your time!

@JeremyCJM nope, no smoothing on expressions either

@sunshineatnoon yeah, the rotation looks fine, but we are seeing artifacts coming from the positional encoding. This suggests your camera frustum is viewing areas outside of (-1,1), therefore periodically repeating. Either your focal length is wrong, or your model/camera poses are not normalized properly (z of the translation should be around 0.5).

Hi, @gafniguy Thanks for your response. Could you please tell me how to correctly call the render_debug_camera_matrix function? I tried to use render_debug_camera_matrix(poses[0], hwf[-1]) using the cameras provided by NeRFace, but the rendering is not correct. Does the pose and intrinsics need to be processed before rendering?