ClementPinard / SfmLearner-Pytorch

Pytorch version of SfmLearner from Tinghui Zhou et al.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

questions about the loss function

huagl opened this issue · comments

commented

Hi, Clément Pinard
It's an impressive job that you have done in this project and the code is written beautifully. May I ask you some questions about the loss function?
For the photometric_reconstruction_loss, it use the predicted depth and pose to get the grid from ref_imgs to tgt_img. Thus, we can use the F.grid_sample to generate tgt_img from ref_imgs.
But in theory, we can also generate ref_imgs from tgt_img. Do you have any idea about how to realize this operation?

To realize this operation, you actually need depth maps for ref_imgs. The inverse warp is, as hinted by its name, inverse.
There's no way to make a direct warp (constructing ref_img with tgt_img and depth relative to tgt_img) with bilinear sampling, you need much more complicated functions such as https://github.com/BachiLi/redner .

commented

Thanks for your reply. Actually, we already use the depth from tgt_img and the pose to calculate the coordinates' correspondence between tgt_img and ref_img. Therefore, we can use this grid(tgt_img to ref_img) to get the inv_grid(ref_img to tgt_img). But I found this is a little hard to achieve this.

coordinates correspondence is only one way, sadly. If you want to reconstruct ref_img points, you need to ref_imgs integer point positions in tgt_img what you compute with tgt_img depth are the opposite : integer point position of tgt_imgin ref_img.

This problem is the same as inverting optical flow : if you know the optical flow from ref_img to tgt_img, how can you know the optical flow from tgt_img to ref_img ?
This is not trivial, because of occlusions for example, and not as easily differentiable as with bilinear sampling.

commented

I see. I found the problems that you told when I tried to implement my naive idea. Thanks for your patience~

commented

Hey can your function work directly on depth maps. I want to use your inverse warp function to transform a depthmap from one frame to another

@soulslicer , sorry didn't see your comment before. Depth warping is not trivially the same.

You need to change the depth values before warping them. Imagine you are going forward, the points will change position with respect to the image plane, but their 3d will change too, and thus their depth, which is not the case for color, which is constant.

As such, when you have your 3D points, you need to apply the transformation to them too (and thus changing the Z value) before using the warping.

here is a little snippet to change the inverse warp code :

The important part of the code (that does not change from regular inverse_warp) is in the cam2depth function

def cam2depth(cam_coords, proj_c2p):
    """Transform depth values in the camera frame to the new frame. Notice the translation is done before rotation this time
    Args:
        cam_coords: pixel coordinates defined in the first camera coordinates system -- [B, 4, H, W]
        proj_c2p: rotation and translation matrix of cameras -- [B, 3, 4]
    Returns:
        array of depth coordinates -- [B, H, W]
    """
    b, _, h, w = cam_coords.size()
    cam_coords_flat = cam_coords.view(b, 3, -1)  # [B, 3, H*W]
    pcoords = proj_c2p[:,:,:3].bmm(cam_coords_flat + proj_c2p[:,:,3:])  # [B, 3, H*W]
    Z = pcoords[:, 2].clamp(min=1e-3, max=100).view(b,h,w)
    return Z


def inverse_warp_depth(depth_to_warp, ref_depth, intrinsics, intrinsics_inv, rot=None, translation=None):
    """
    Inverse warp a source image to the target image plane.
    Args:
        depth_to_warp: the source depth (where to sample values) -- [B, H, W]
        ref_depth: target depth map of the target image -- [B, H, W]
        pose: 6DoF pose parameters from target to source -- [B, 6]
        intrinsics: camera intrinsic matrix -- [B, 3, 3]
        intrinsics_inv: inverse of the intrinsic matrix -- [B, 3, 3]
    Returns:
        Source depth warped to the target image plane
    """
    check_sizes(depth_to_warp, 'source depth', 'BHW')
    check_sizes(ref_depth, 'ref depth', 'BHW')
    # check_sizes(pose, 'pose', 'B6')
    check_sizes(intrinsics, 'intrinsics', 'B33')
    check_sizes(intrinsics_inv, 'intrinsics', 'B33')

    assert(intrinsics_inv.size() == intrinsics.size())

    batch_size, img_height, img_width = depth_to_warp.size()

    cam_coords = pixel2cam(ref_depth, intrinsics_inv)  # [B,3,H,W]

    depth_to_warp_coords = pixel2cam(depth_to_warp, intrinsics_inv)  # [B,3,H,W]

    pose_mat = pose_vec2mat(rot, translation)  # [B,3,4]

    inverse_rot_mat = pose_vec2mat(-rot, None)  # [B,3,3]

    if translation is not None:
        inverse_translation = inverse_rot_mat.bmm(-translation.unsquee(-1))
        inverse_pose_mat = torch.cat([inverse_rot_mat, inverse_translation], dim=2)
    else:
        inverse_pose_mat = inverse_rot_mat

    # Get projection matrix for tgt camera frame to source pixel frame
    proj_cam_to_src_pixel = intrinsics.bmm(pose_mat)  # [B, 3, 4]

    src_pixel_coords = cam2pixel(cam_coords, proj_cam_to_src_pixel)  # [B,H,W,2]

    # change depths values of first depth map to match new pose, however depth is NOT resampled
    new_depth = cam2depth(depth_to_warp_coords, inverse_pose_mat)  # [B,H,W]

    projected_depth = torch.nn.functional.grid_sample(new_depth.unsqueeze(1), src_pixel_coords)[:,0]

    return projected_depth

Hi, ClementPinard
Thank you for posting the inverse_warp_depth code. As shown in your code, the depth inverse procedure needs depth_to_warp and ref_depth, it means that to generate the projected_target_depth, the target_depth is needed as input. It is a little strange! Because what we need is to generate the target depth from source depth and the pose motion between source and target.

I found that the ref_depth or target depth map of the target image is only used to generate the pixel coords, so I think the ref_depth should be the image plane rather than the depth of target image.

ref_depth = f*torch.ones(B, H, W)  # f is the focal length of camera

Would you help me to check if this is right?

If you do that you won't have any parallax effects : for instance in a lateral movement, close points will move faster on the screen than far points. If you use a constant ref_depth they will all move in the image at the same speed, as if you were looking at a textured plane instead of the real world.

I agree that the need for 2 depth map is counter-intuitive, you feel that with 1 depth map and 1 pose vector, you have all the information to compute the new depth map, but this is only true for direct warping. The problem is that direct warping is much more difficult to implement as a differentiable than inverse warping.

Inverse warping needs the depth map corresponding the new pose view point, and the thing you are trying to warp (be it image or depth) corresponding to the original pose viewpoint. That way, contrary to direct warp, each point of the warped image/depth has a coordinate in the source image, and gradient of pixel color with respect to coordinate is very easy to compute.
I wrote a piece on that in my thesis (https://pastel.archives-ouvertes.fr/tel-02285215/document chapter 4)

If you are searching for direct warp based optimization, you can have a look at Pytorch3D. They implement a method for 2D image optimization based on direct rendering of 3D primitives. You just need to convert your depth map to 3D mesh (should be easy) and plug their solution.