Yzmblog / MonoHuman

MonoHuman: Animatable Human Neural Field from Monocular Video (CVPR 2023)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Annotation files for Human3.6m dataset

Dipankar1997161 opened this issue · comments

Hello @Yzmblog,

Thank you for the awesome work.

Since you are using Humannerf, I had some questions on processing human3.6m dataset.

So I have the entire dataset, however, I tried generating the smpl from ROMP ( also used their processed file - Google drive ), but for some reason, the rendering is just terrible.

Maybe I am using the wrong camera values or something. ( ROMP does not provide the intrinsic and extrinsic values, so I am using -
Extrinsic - np.eye(4)
Intrinsic - fx, fy = 443.4 ( their given value in config.py) for cx and Cy, I am using the h36m values.

Could you tell me how can I process the h36m files, for 3d reconstruction. I would love to use the "3D GT" provided by the human3.6m dataset instead of processing it through videos using openpose ( if that's possible and accurate )

Kindly do let me know, as one of my frined suggested me this repo regarding the work.

Thank you once again.

Hello @Yzmblog,

Thank you for the awesome work.

Since you are using Humannerf, I had some questions on processing human3.6m dataset.

So I have the entire dataset, however, I tried generating the smpl from ROMP ( also used their processed file - Google drive ), but for some reason, the rendering is just terrible.

Maybe I am using the wrong camera values or something. ( ROMP does not provide the intrinsic and extrinsic values, so I am using - Extrinsic - np.eye(4) Intrinsic - fx, fy = 443.4 ( their given value in config.py) for cx and Cy, I am using the h36m values.

Could you tell me how can I process the h36m files, for 3d reconstruction. I would love to use the "3D GT" provided by the human3.6m dataset instead of processing it through videos using openpose ( if that's possible and accurate )

Kindly do let me know, as one of my frined suggested me this repo regarding the work.

Thank you once again.

Hi. Thanks for your attention! As NeRF methods are sensitive to the camera parameters, so accurate camera values are essential to these models. I am not familiar with ROMP, So I don't know how to get the camera using their method. But I suggest you project the SMPL vertices back to images to check if you set the camera right.

Hello, @Yzmblog
thank you once again for the response.
ROMP just provides camera values, similar to PARE "pred_cam", however I am unsure how to use these values.
Apart from these I also got the "trans" value.

Could you tell me how can I use these 6 values for rendering.

Here are all the values ROMP provides alongside SMPL

image_path
| - subject_0
| - | - cam (3,) # 3 camera parameters of weak-perspective camera, (scale, tranlation_x, tranlation_y)
| - | - pose (72,) # 72 SMPL pose parameters.
| - | - betas (10,) # 10 SMPL shape parameters.
| - | - j3d_all54 (54, 3) # 3D keypoints coordinates regressed from the estimated body mesh.
| - | - j3d_smpl24 (24, 3) # 3D pose results in SMPL format
| - | - j3d_spin24 (24, 3) # 3D pose results in SPIN format
| - | - j3d_op25 (25, 3) # 3D pose results in Openpose format
| - | - verts (6890, 3) # 3D coordinates of 3D human mesh.
| - | - pj2d (54, 2) # 2D coordinates of 2D keypoints in padded input image.
| - | - pj2d_org (54, 2) # 2D coordinates of 2D keypoints in original input image.
| - | - trans (3,) # rough 3D translation converted from the estimated camera parameters.
| - | - center_conf (1,) # confidence value of the detected person on centermap.
| - subject_1

Now could you tell me, how do I use the values. In another comment you mentioned to convert the s,tx and ty into tx, ty and tz,

How do I get tz from s, tx and ty.

I would appreciate your response on this.
...

Hello, @Yzmblog thank you once again for the response. ROMP just provides camera values, similar to PARE "pred_cam", however I am unsure how to use these values. Apart from these I also got the "trans" value.

Could you tell me how can I use these 6 values for rendering.

Here are all the values ROMP provides alongside SMPL

image_path | - subject_0 | - | - cam (3,) # 3 camera parameters of weak-perspective camera, (scale, tranlation_x, tranlation_y) | - | - pose (72,) # 72 SMPL pose parameters. | - | - betas (10,) # 10 SMPL shape parameters. | - | - j3d_all54 (54, 3) # 3D keypoints coordinates regressed from the estimated body mesh. | - | - j3d_smpl24 (24, 3) # 3D pose results in SMPL format | - | - j3d_spin24 (24, 3) # 3D pose results in SPIN format | - | - j3d_op25 (25, 3) # 3D pose results in Openpose format | - | - verts (6890, 3) # 3D coordinates of 3D human mesh. | - | - pj2d (54, 2) # 2D coordinates of 2D keypoints in padded input image. | - | - pj2d_org (54, 2) # 2D coordinates of 2D keypoints in original input image. | - | - trans (3,) # rough 3D translation converted from the estimated camera parameters. | - | - center_conf (1,) # confidence value of the detected person on centermap. | - subject_1

Now could you tell me, how do I use the values. In another comment you mentioned to convert the s,tx and ty into tx, ty and tz,

How do I get tz from s, tx and ty.

I would appreciate your response on this. ...

Hi, Sorry for the late reply. You can use the following code to convert them.

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
	# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
	# in 3D given the bounding box size
	# This camera translation can be used in a full-perspective projection

	perspective_camera = np.stack(
	    [
	        weak_perspective_camera[1],
	        weak_perspective_camera[2],
	        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
	    ],
	    axis=-1
	)

	return perspective_camera

Then the extrinsic parameters will be:

E = [[1, 0, 0, tx],
     [0, 1, 0, ty],
     [0, 0, 1, tz],
     [0, 0, 0, 1]]

Thanks a lot this helped.