Yaw to rot_y conversion

Question

Yaw to rot_y conversion

viraj96 opened this issue 4 years ago · comments

Hi,

I am attempting to convert the output of the inference model to the KITTI 3D Object Detection format. I know that the current output of the model is [x, y, z, w, l, h, yaw]. Since this output is in the velodyne frame evident from the "yaw" angle, I used the KITTI velo_to_cam transform provided in the calib folder to get the corresponding output in the camera2 frame. By doing this I get accurate estimates of [x, y, z] but the rot_y estimate is way off. For example on the 4th training input of KITTI 3D Detection dataset, the output in the KITTI format in the velodyne frame looks like this,

Car -1 -1 -10 0 0 0 50 1.54 1.64 4.01 38.17 16.24 -0.65 0.00 0.34
Car -1 -1 -10 0 0 0 50 1.52 1.62 3.95 51.39 16.53 -0.53 0.00 0.31

The same output converted to camera2 frame looks like this,

Car -1 -1 -10 0 0 0 50 1.54 1.64 4.01 -15.95 2.69 38.01 0.16 0.34
Car -1 -1 -10 0 0 0 50 1.52 1.62 3.95 -16.15 2.74 51.23 0.42 0.31

The raw inference model output in the same format looks like this,

Car 1.54 1.64 4.01 38.17 14.69 -0.65 1.74
Car 1.52 1.62 3.95 51.39 15.01 -0.53 2.00

The ground truth for the same input for Car class looks like this,

Car 0.00 0 1.96 280.38 185.10 344.90 215.59 1.49 1.76 4.01 -15.71 2.16 38.26 1.57
Car 0.00 0 1.88 365.14 184.54 406.11 205.20 1.38 1.80 3.41 -15.89 2.23 51.17 1.58

As you can see, the output in the camera2 frame is very close to the ground truth in all attributes except the rot_y. Can you help me in figuring out if the output of yaw angle by the model is correct or not?

I used dummy values for the bbox, alpha, truncation and occlusion attributes.
The KITTI 3D detection format can be found here

Muhammad Sarim Mehdi · Answer 1 · Tue Aug 04 2020 20:48:55 GMT+0800 (China Standard Time)

Hi were you able to figure this out. It is a persistent problem with this network. I don't know why the authors messed up the angles like this.

DarkMatter · Answer 2 · Sat Jan 23 2021 06:34:55 GMT+0800 (China Standard Time)

Hey @viraj96 @sarimmehdi
could you guys tell me how to convert the rotation around Z axis in lidar frame to camera frame. is there any code?

thanks in advance

Jacob Hultman · Answer 3 · Mon Mar 29 2021 12:32:30 GMT+0800 (China Standard Time)

I think the preprocessing can easily be understood by looking at the code. You can find all preprocessing applied to the angle in the dataset.py file; see this line in particular.

In KITTI dataset, authors use camera coordinate frame with y-axis pointing down (as is typically done in image coordinate systems). But in lidar coordinate frame, the (corresponding) z-axis is measured positive in up direction. And so the axis of rotation for yaw angle is reversed in lidar frame compared to camera frame (in which ground truth is provided). This is my reason for negating the yaw angle when converting from camera frame to lidar frame. This project does all of the modeling in the lidar frame. If you want to do KITTI evaluation in camera frame, you need to convert the model outputs from lidar frame to camera frame. I have not written the code to do this.

KITTI coordinate frames

sensor	x	y	z
camera	right	down	forward
velodyne	forward	left	up

Btw, you can use the provided bird's eye view drawer code to verify whether the yaw angle is represented correctly in lidar frame. I am fairly certain there is no problem here. You can let me know if you believe there is a mistake in the code.