mattpoggi / mono-uncertainty

CVPR 2020 - On the uncertainty of self-supervised monocular depth estimation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Content of uncertainty map by log method

shawLyu opened this issue · comments

Hi, thanks for your great work. I noticed that there were two work for MDE in CVPR20 using uncertainty loss, another work was D3VO. Both of you used the same uncertainty loss (log section in your paper), but gotten totally different uncertainty map. I can get uncertainty map as yours. So I‘d like to ask if you know the reason. Looking forward to your reply. Thanks.

Hi @shawLyu,
thanks for pointing it out, that's an interesting question.
The main difference I've found between the two is that D3VO also estimates the brightness transformation parameters between the different frames. This may have an impact

Hi @mattpoggi
Thanks for your reply, I will do this experiment next.

I forgot to mention that, according to D3VO paper, "DepthNet also predicts the depth map D_{t^s} of the right image I_{t^s}". This can also make a difference.

Hi @mattpoggi

Thanks for your innovative work. I had the same confusion before, but after conducting many experiments, I found there might be a potential issue in the implementation (not sure about it as both mono-uncertainty and D3VO did not release their code).

In my opinion, in the part of calculating the loss of Log, the shape of the to_optimse should be the same as the uncertainty.

(Pdb) to_optimise.shape
torch.Size([8, 192, 640])
(Pdb) uncer.shape
torch.Size([8, 1, 192, 640])
(Pdb) (to_optimise / uncer + torch.log(uncer)).shape
torch.Size([8, 8, 192, 640])

However, even if the shape is not a perfect match, the operation is still legal, as shown above, and could lead to the results like yours. On the other side, D3VO is doing it in the same shape and the results look totally different. Note that the following networks are using pure monodepth2 with a different shape of uncertainty, no extra skills (brightness transformation, right disparity prediction, or augmentation) are used.

img

Please let me know if I have any misunderstanding about your paper, thank you.