Brummi / MonoRec

Official implementation of the paper: MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera (CVPR 2021)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DVSO keypoints as metric depth?

morsingher opened this issue · comments

Hi, thanks for sharing this amazing work.

I have a question on the DVSO keypoints you provide in the readme. As far as I understood, they actually contain disparity values, which you convert to depth as follows:

depth = (w * depth / (0.54 * f_x * 65535))

The conversion depth = (baseline * focal) / disparity is fine for me, but I don't really understand how the width and the 65535 interact in this formula. Could you clarify a bit better how to obtain metric depth values from these keypoints?

Thank you in advance.

Hi @morsingher, thanks for your interest in our work.

Sorry for the confusion about the conversion. The result you get from this formula is the inverse of the metric depth. So if you want to get the metric depth, you just need to do an inversion.

About how the values are stored. As you said, the values we stored are indeed disparities, but they're normalized by the width of the image, so

norm_disp = f_x * b / (d * width).

This way, the norm_disp is always in [0,1] and invariant to the width of the image.
Then, since we use 16-bit png files as the storage of the depth maps, we further multiply the norm_disp with 65535 to save the values as integers, so

stored_value = 65535 * norm_disp.

Hope the above explanation helps. Please let us know if you have further questions.

Hi @Yelen719,

thanks for the quick answer. So, just to sum up, I should load the png with PIL and perform the following steps:

  1. norm_disp = stored_value / 65535, which correctly gives values between 0 and 1.
  2. disp = norm_disp * w, and this is measured in pixels.
  3. depth = (f_x * b) / disp, which should be the metric depth.

It seems fine now, thanks. The only issue I'm facing is that some of these keypoints (I would say around 15/20% for each image on KITTI 04) have depth values much greater than 80 m. Of course I can just ignore them, but I'm wondering if you also use these ones during training?

Yes, we also use them for training.

I will close this issue for now.
If you have further questions, feel free to reach out to us.