jhultman / vision3d

Research platform for 3D object detection in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Yaw to rot_y conversion

viraj96 opened this issue · comments

Hi,

I am attempting to convert the output of the inference model to the KITTI 3D Object Detection format. I know that the current output of the model is [x, y, z, w, l, h, yaw]. Since this output is in the velodyne frame evident from the "yaw" angle, I used the KITTI velo_to_cam transform provided in the calib folder to get the corresponding output in the camera2 frame. By doing this I get accurate estimates of [x, y, z] but the rot_y estimate is way off. For example on the 4th training input of KITTI 3D Detection dataset, the output in the KITTI format in the velodyne frame looks like this,

Car -1 -1 -10 0 0 0 50 1.54 1.64 4.01 38.17 16.24 -0.65 0.00 0.34
Car -1 -1 -10 0 0 0 50 1.52 1.62 3.95 51.39 16.53 -0.53 0.00 0.31

The same output converted to camera2 frame looks like this,

Car -1 -1 -10 0 0 0 50 1.54 1.64 4.01 -15.95 2.69 38.01 0.16 0.34
Car -1 -1 -10 0 0 0 50 1.52 1.62 3.95 -16.15 2.74 51.23 0.42 0.31

The raw inference model output in the same format looks like this,

Car 1.54 1.64 4.01 38.17 14.69 -0.65 1.74
Car 1.52 1.62 3.95 51.39 15.01 -0.53 2.00

The ground truth for the same input for Car class looks like this,

Car 0.00 0 1.96 280.38 185.10 344.90 215.59 1.49 1.76 4.01 -15.71 2.16 38.26 1.57
Car 0.00 0 1.88 365.14 184.54 406.11 205.20 1.38 1.80 3.41 -15.89 2.23 51.17 1.58

As you can see, the output in the camera2 frame is very close to the ground truth in all attributes except the rot_y. Can you help me in figuring out if the output of yaw angle by the model is correct or not?

I used dummy values for the bbox, alpha, truncation and occlusion attributes.
The KITTI 3D detection format can be found here

Hi were you able to figure this out. It is a persistent problem with this network. I don't know why the authors messed up the angles like this.

Hey @viraj96 @sarimmehdi
could you guys tell me how to convert the rotation around Z axis in lidar frame to camera frame. is there any code?

thanks in advance

I think the preprocessing can easily be understood by looking at the code. You can find all preprocessing applied to the angle in the dataset.py file; see this line in particular.

In KITTI dataset, authors use camera coordinate frame with y-axis pointing down (as is typically done in image coordinate systems). But in lidar coordinate frame, the (corresponding) z-axis is measured positive in up direction. And so the axis of rotation for yaw angle is reversed in lidar frame compared to camera frame (in which ground truth is provided). This is my reason for negating the yaw angle when converting from camera frame to lidar frame. This project does all of the modeling in the lidar frame. If you want to do KITTI evaluation in camera frame, you need to convert the model outputs from lidar frame to camera frame. I have not written the code to do this.

KITTI coordinate frames

sensor x y z
camera right down forward
velodyne forward left up

Btw, you can use the provided bird's eye view drawer code to verify whether the yaw angle is represented correctly in lidar frame. I am fairly certain there is no problem here. You can let me know if you believe there is a mistake in the code.