tjqansthd / LapDepth-release

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting metric depth from Depth images

gv237-07 opened this issue · comments

Hi. First of all thank you for your amazing work.
Is there any way you could get a metric distance between the camera and the object from the depth map outputted by this work?
I get how you could get the distance when you have a disparity map, but I'm not quite sure if the same concept applies here.
Thank you

Hi! Thanks for your interest.

The pixel value of the depth map itself is just defined as the distance from the camera to that pixel. For example, for models pre-trained in KITTI dataset, the pixel values of the output depth map range from 0 to 80, which is the metric distance from the camera (Note that the focal length in KITTI is about 721 pixels).

Hi! Thanks for your interest.

The pixel value of the depth map itself is just defined as the distance from the camera to that pixel. For example, for models pre-trained in the KITTI dataset, the pixel values of the output depth map range from 0 to 80, which is the metric distance from the camera (Note that the focal length in KITTI is about 721 pixels).

Hello, thank you for your work and for open-sourcing it.
I want to follow up on this question briefly when testing on the KITTI dataset
This code block here

    if args.pretrained == 'KITTI':
        out = out[int(out.shape[0]*0.18):,:]
        out = out*256.0
    elif args.pretrained == 'NYU':
        out = out*1000.0
    out = out.cpu().detach().numpy().astype(np.uint16)
    out = (out/out.max())*255

If I understand correctly, out = (out/out.max())*255 essentially normalizes the pixel values and produce a heatmap image for visulization.
The actual depth value of the depth map is just out = (out/out.max())*80 based on your reply here.
Please let me know if this is incorrect.

Yes, you are correct.
Since it is a demo code, normalizing code block is just for visualization.
Since the final output is multiplied by the sigmoid layer and the max_depth value of each dataset in the model forwarding code, the value of the final output is actual depth value. Therefore, the part that normalizes the output after that is all about visualization.
Thank you!

Hi, I'm sorry to confirm that If i want to get the metric depth,just change out = (out/out.max())*255 to out = (out/out.max())*80? I ask it because i changed it but some result is negative number, This is obviously unlikely.

I would appreciate it if you could answer me.

Since the final output depth map of the decoder passes through the sigmoid layer and is multiplied by max_depth, a value of more than 0 is guaranteed. It can be shown in

return [(lap_lv5)*self.max_depth, (lap_lv4)*self.max_depth, (lap_lv3)*self.max_depth, (lap_lv2)*self.max_depth, (lap_lv1)*self.max_depth], final_depth*self.max_depth

Thank you!