yzhao520 / CPP

CVPR 2021 "Camera Pose Matters: Improving Depth Prediction by Mitigating Pose Distribution Bias"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some issues of specific calculation process in PDA

shuowang666 opened this issue · comments

image
An important step in the PDA is the calculation of Tre. I think for the coordinates of the origin camera system before the conversion, first by pose to the world coordinate system, and then according to the pose of the converted camera coordinate system to the coordinates of that coordinate system. But I see that the implementation of “cam_coord_rgb” in your code seems to have some problems, cam_coord is the 3D coordinates under the original camera coordinate system, why should it multiply R2 with R2 first? R2 is a rotation matrix in the transformed camera system, they are not a coordinate system at all. What is the meaning of “cam_coord_rgb”?
image

Sorry for the confusion. The cam_coord_rgb is computed in the opposite direction compared to the cam_coord_depth. The purpose of this is to use grid_sample function. Essentially, this cam_coord_rgb is a flow-field grid from target to source images before normalizing into [-1, 1]. Please let me know if it is still unclear.

Sorry, I still don't understand how this works, cam_coord is a 3 dimensional coordinate in the original camera system, my understanding is that only forward calculations can be made, the reverse formula looks incorrect

The cam_coord represents the 3-dimensional coordinates in the original camera system, but your formula seems to treat the cam_coord as the 3-dimensional coordinates in the converted camera system, which does not seem to be equivalent because the depths are different for the same position u,v in the two camera systems

Yes, pc is the 3 dimensional point cloud in the original camera coordinate. Normally, in order to get the flow-field grid from target back to source images, we need the depth from the target view to warp back which is not directly available. In this case, one potential solution is to compute a forward direction and then compute a backward direction to get the warp field. But it might not be dense enough and it's computational expensive. Alternatively, directly projecting rgb values in the forward direction works fine but you might see some grid artifacts.
In this released implementation, it is a little tricky since we are dealing with small rotation changes and no translations. We find directly using this point cloud and warp back works fine enough. You are welcome to use other approaches mentioned above.

Thanks a lot.