Do not understand the difference between Dts and Dt

Question

Do not understand the difference between Dts and Dt

dragon97ytd opened this issue 2 years ago · comments

Thanks for your outstanding work in this area, it can be applied to a lot of places. But I do not understand one thing in the paper that you write below

"Specifically, for each image, we predict its depth maps Dt and DSt using
the cost volumes formed by temporal stereo images C and static stereo images CS, respectively."
(The location is in Muti-stage Training --> Bootstrapping --> Mask Module below, if you could not find it you can use search button in your PDF reader to find the sentence)

Can you tell me what does "static stereo images" mean and where are they implemented in your code?
By the way, if convenient, can you send me the code about how to generate moving objection mask ?
My email is "tingdong.yu@cripac.ia.ac.cn" and you can send the code to it,

Thanks for your great contribution again.
best wishes

Hardik Shah · Answer 1 · Tue Sep 06 2022 16:17:46 GMT+0800 (China Standard Time)

Hi,
According to my understanding of the paper, the term "static stereo images" corresponds to the image given by the second lens of the stereo camera at the same timestamp. It has been mentioned that stereo images are being used during the bootstrapping stage of training. And the KITTI dataset has stereo images i.e. at each timestamp there are two images captured, one from the left and the right camera. Hence, during training these two views are utilized for the formation of the cost volume apart from the temporal stereo images.

Regarding implementation in code, this line in the KITTI dataloader is where the static stereo images are used.

Felix Wimbauer · Answer 2 · Tue Sep 06 2022 16:51:37 GMT+0800 (China Standard Time)

Hi @dragon97ytd ,
thank you for your interest in our work!
And thank you @hardik01shah for replying, you are right!

Dt is the depth map predicted from a cost volume that was created from the monocular camera sequence (i.e. same camera, different time steps). In the code, these frames are usually just "frames" in the data dict.

DSt is the depth map predicted from a cost volume that was created from the stereo frame only (i.e. different camera, same time step). In the code, this frame is the "stereo_frame" in the data dict.

I also sent you the code via email.

Best,
Felix