It seems that joint_image can't be utilized as 2D joints

Question

It seems that joint_image can't be utilized as 2D joints

Javacr opened this issue 8 months ago · comments

Hi, SMPLer-X had already provided 3D info, but I still wanna obtain 2D information for my work (I would not like to add another 2D joints detector). Here is my modification (refer to InterWild).
SMPLer_X.py

def project_to_body_space(part_name, bbox):
    hand_bbox_w = bbox[:, None, 2] - bbox[:, None, 0]
    hand_bbox_h = bbox[:, None, 3] - bbox[:, None, 1]
    joint_img[:, smpl_x.pos_joint_part[part_name], 0] *= (
        (hand_bbox_w / cfg.output_hand_hm_shape[2]))
    joint_img[:, smpl_x.pos_joint_part[part_name], 1] *= (
        (hand_bbox_h / cfg.output_hand_hm_shape[1]))
    joint_img[:, smpl_x.pos_joint_part[part_name], 0] += bbox[:, None, 0]
    joint_img[:, smpl_x.pos_joint_part[part_name], 1] += bbox[:, None, 1]
        
for part_name, bbox in (('lhand', lhand_bbox), ('rhand', rhand_bbox)):
    project_to_body_space(part_name, bbox)

joint_img[:,smpl_x.pos_joint_part['body'],0] *= (cfg.input_body_shape[1]/ cfg.output_hm_shape[2])
joint_img[:,smpl_x.pos_joint_part['body'],1] *= (cfg.input_body_shape[0]/ cfg.output_hm_shape[1])

joint_img[:,:,0] *= (cfg.input_img_shape[1] / cfg.input_body_shape[1])
joint_img[:,:,1] *= (cfg.input_img_shape[0] / cfg.input_body_shape[0])

inference.py

joint_img = out['joint_img'].cpu().numpy()[0]
joint_img_xy1 = np.concatenate((joint_img[:,:2], np.ones_like(joint_img[:,:1])),1)
joint_img = np.round(np.dot(bb2img_trans, joint_img_xy1.transpose(1,0)).transpose(1,0)).astype(int)

lhand_bbox = out['lhand_bbox'].cpu().numpy().reshape(2,2)
hand_bbox_xyl = np.concatenate((lhand_bbox, np.ones_like(lhand_bbox[:,:1])),1)
lhand_bbox = np.round(np.dot(bb2img_trans, hand_bbox_xyl.transpose(1,0)).transpose(1,0)).astype(int)
lhand_bbox = [lhand_bbox[0,0], lhand_bbox[0,1], lhand_bbox[1,0], lhand_bbox[1,1]]  

rhand_bbox = out['rhand_bbox'].cpu().numpy().reshape(2,2)
hand_bbox_xyr = np.concatenate((rhand_bbox, np.ones_like(rhand_bbox[:,:1])),1)
rhand_bbox = np.round(np.dot(bb2img_trans, hand_bbox_xyr.transpose(1,0)).transpose(1,0)).astype(int)  
rhand_bbox = [rhand_bbox[0,0], rhand_bbox[0,1], rhand_bbox[1,0], rhand_bbox[1,1]]

When I use default params, the results seems wired, the joints are not accurate.

After modified as below:

# lhand_bbox = restore_bbox(lhand_bbox_center, lhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 2).detach()  
# rhand_bbox = restore_bbox(rhand_bbox_center, rhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 2).detach()
lhand_bbox = restore_bbox(lhand_bbox_center, lhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 1.5).detach()  
rhand_bbox = restore_bbox(rhand_bbox_center, rhand_bbox_size, cfg.input_hand_shape[1] / cfg.input_hand_shape[0], 1.5).detach()
# bbox = process_bbox(mmdet_box_xywh, original_img_width, original_img_height)
 bbox = process_bbox(mmdet_box_xywh, original_img_width, original_img_height, ratio=1.1)

Things get better, but still not accurate.

How to explain this?

Javacr · Answer 1 · Fri Nov 10 2023 11:22:10 GMT+0800 (China Standard Time)

Sorry, I misunderstood. Actually, These results are normal, I should not compare whole body with hand joints methods.