Content of uncertainty map by log method

Question

Content of uncertainty map by log method

shawLyu opened this issue 4 years ago · comments

Hi, thanks for your great work. I noticed that there were two work for MDE in CVPR20 using uncertainty loss, another work was D3VO. Both of you used the same uncertainty loss (log section in your paper), but gotten totally different uncertainty map. I can get uncertainty map as yours. So I‘d like to ask if you know the reason. Looking forward to your reply. Thanks.

Matteo Poggi · Answer 1 · Mon Jul 27 2020 16:27:11 GMT+0800 (China Standard Time)

Hi @shawLyu,
thanks for pointing it out, that's an interesting question.
The main difference I've found between the two is that D3VO also estimates the brightness transformation parameters between the different frames. This may have an impact

shawlyu · Answer 2 · Mon Jul 27 2020 17:18:34 GMT+0800 (China Standard Time)

Hi @mattpoggi
Thanks for your reply, I will do this experiment next.

Matteo Poggi · Answer 3 · Mon Jul 27 2020 17:39:42 GMT+0800 (China Standard Time)

I forgot to mention that, according to D3VO paper, "DepthNet also predicts the depth map D_{t^s} of the right image I_{t^s}". This can also make a difference.

Justin Wu · Answer 4 · Fri May 28 2021 03:18:15 GMT+0800 (China Standard Time)

Hi @mattpoggi

Thanks for your innovative work. I had the same confusion before, but after conducting many experiments, I found there might be a potential issue in the implementation (not sure about it as both mono-uncertainty and D3VO did not release their code).

In my opinion, in the part of calculating the loss of Log, the shape of the to_optimse should be the same as the uncertainty.

(Pdb) to_optimise.shape
torch.Size([8, 192, 640])
(Pdb) uncer.shape
torch.Size([8, 1, 192, 640])
(Pdb) (to_optimise / uncer + torch.log(uncer)).shape
torch.Size([8, 8, 192, 640])

However, even if the shape is not a perfect match, the operation is still legal, as shown above, and could lead to the results like yours. On the other side, D3VO is doing it in the same shape and the results look totally different. Note that the following networks are using pure monodepth2 with a different shape of uncertainty, no extra skills (brightness transformation, right disparity prediction, or augmentation) are used.

Please let me know if I have any misunderstanding about your paper, thank you.