OpenDriveLab / PPGeo

[ICLR 2023] Pytorch implementation of PPGeo, a fully self-supervised driving policy pre-training framework to learn from unlabeled driving videos.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why choosing only the first head in pose computation

shan18 opened this issue · comments

Hi PPGeo Team,

The output of the Pose Decoder contains two heads for the axisangle and translation i.e. the shapes of the output are like [.., 2, ...]. But during the calculation of cam_T_cam, I see that only the first is ever used.

PPGeo/model.py

Lines 122 to 125 in bb37f52

outputs[("cam_T_cam", 0, -1)] = transformation_from_parameters(
axisangle1[:, 0], translation1[:, 0], invert=True)
outputs[("cam_T_cam", 0, 1)] = transformation_from_parameters(
axisangle2[:, 0], translation2[:, 0], invert=False)

Can you please help me clarify why the network predict two heads when only 1 of them is used? Is there any particular purpose the second head is solving because according to the code, I see that only the first head is ever used.

Yes, the second head is never used here. We set the num_frames_to_predict_for of the PoseDecoder to keep the consistency with the original MonodepthV2 model structure. You can change the num_frames_to_predict_for into 1 to only keep 1 head.

I see. Thanks a lot for the response.