OpenDriveLab / PPGeo

Hi PPGeo Team,

The output of the Pose Decoder contains two heads for the axisangle and translation i.e. the shapes of the output are like [.., 2, ...]. But during the calculation of cam_T_cam, I see that only the first is ever used.

PPGeo/model.py

Lines 122 to 125 in bb37f52

    
           outputs[("cam_T_cam", 0, -1)] = transformation_from_parameters( 
        
           				axisangle1[:, 0], translation1[:, 0], invert=True) 
        
           outputs[("cam_T_cam", 0, 1)] = transformation_from_parameters( 
        
           				axisangle2[:, 0], translation2[:, 0], invert=False)

Can you please help me clarify why the network predict two heads when only 1 of them is used? Is there any particular purpose the second head is solving because according to the code, I see that only the first head is ever used.

Yes, the second head is never used here. We set the num_frames_to_predict_for of the PoseDecoder to keep the consistency with the original MonodepthV2 model structure. You can change the num_frames_to_predict_for into 1 to only keep 1 head.

I see. Thanks a lot for the response.

	outputs[("cam_T_cam", 0, -1)] = transformation_from_parameters(
	axisangle1[:, 0], translation1[:, 0], invert=True)
	outputs[("cam_T_cam", 0, 1)] = transformation_from_parameters(
	axisangle2[:, 0], translation2[:, 0], invert=False)

Why choosing only the first head in pose computation