This repository contains the code of Endo-Depth's depth prediction from single images, the photometric and the others trackings methods and the volumetric fusion used in the paper
Localization and Reconstruction in Endoscopic Videos using Depth Networks and Photometric Constraints
David Recasens, José Lamarca, José M. Fácil, José María M. Montiel and Javier Civera
I3A, University of Zaragoza
Full video of Endo-Depth-and-Motion working on Hamlyn dataset
@misc{recasens2021endodepthandmotion,
title={Endo-Depth-and-Motion: Localization and Reconstruction in Endoscopic Videos using Depth Networks and Photometric Constraints},
author={David Recasens and Jos{\'{e}} Lamarca and Jos{\'{e}} M. F{\'{a}}cil and J. M. M. Montiel and Javier Civera},
year={2021},
eprint={2103.16525},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Endo-Depth-and-Motion is a pipeline where first, pixel-wise depth is predicted on a set of keyframes of the endoscopic monocular video using a deep neural network (Endo-Depth). The motion of each frame with respect to the closest keyframe is estimated by minimizing the photometric error, robustified using image pyramids and robust error functions. Finally, the depth maps of the keyframes are fused in a Truncated Signed Distance Function (TSDF)-based volumetric representation.
We have ran our experiments under CUDA 9.1.85, CuDNN 7.6.5 and Ubuntu 18.04. We recommend create a virtual environment with Python 3.6 using Anaconda conda create -n edam python=3.6
and install the dependencies as
conda install -c conda-forge opencv=4.2.0
pip3 install -r path/to/Endo-Depth-and-Motion/requirements.txt
The Hamlyn rectified images and the rectified calibration used to train and test the Endo-Depth models can be found here. The Hamlyn data used to test the tracking and the volumetric fusion is here. The color and depth images are slightly cropped to avoid the small distortions of the depth Endo-Depth produces at the borders. The depth was computed using the stereo Endo-Depth models and it is in [mm] and in image format uint16. The saturation depth is 300 [mm]. You can also replace it with your own data.
To predict the depth for a single or multiple images use
python apps/depth_estimate/__main__.py --image_path path/to/image_folder --model_path path/to/model_folder
You have must have already download the Endo-Depth model you want to use. If you prefer to store the depth predictions in another folder use the argument --output_path. You can also select the type of the output with --output_type which is set by default to grayscale (grayscale depth images), but you can also choose color (colormapped depth images). By default, the saturation depth is set to 300 [mm], you can change this limit using --saturation_depth. Also, the image depth scaling is by default 52.864 because for Hamlyn dataset the weighted average baseline is 5.2864. This number is multiplied by 10 because the imposed baseline during training is 0.1. The image extension to search for in folder can be changed with --ext (now set as jpg), and you can disable CUDA using the argument --no_cuda.
You can execute our photometric tracking with
python apps/tracking_ours/__main__.py -d cuda:0 -i path/to/hamlyn_tracking_test_data -o apps/tracking_ours/results
being -i the input path to the folder containing the different video folders, -o the output path where the odometry in format .pkl is saved. If you want to run the script on CPU instead of on GPU just remove the argument -d cuda:0. The ratio frame-keyframe and number of floors of the pyramid are set to 2 by default, but they can be changed with the arguments -fr and -st, respectively. The output odometries of the Hamlyn test data using our tracking can be found here.
To use alternatively the tracking methods of Open3D run
python apps/tracking_open3d/__main__.py -d cuda:0 -i path/to/hamlyn_tracking_test_data -o apps/tracking_open3d/results -t park
The tracking method can be changed modifying the argument -t: point-to-point (ICP point-to-point), point-to-plane (ICP point-to-plane), steinbrucker (photometric) and park (hybrid photometric and geometric). Additionally, with the argument -r you can execute a global registration with RANSAC to compute a pre-translation between two point clouds before calculating the final translation with the local registration.
In order to get the refined 3D map, you can fuse the registered pseudo-RGBD keyframes obtained from Endo-Depth and the tracking with
python apps/volumetric_fusion/__main__.py -i apps/tracking_ours/results/test1.pkl -o path/to/hamlyn_tracking_test_data/test1
where -i is the input odometry in format .pkl computed with the tracking. The output 3D meshes of the Hamlyn test data using the volumetric fusion are here.
Endo-Depth-and-Motion is released under GPL-3.0 License. The code in the folder apps/depth_estimate is property of the authors of Monodepth2 and has its own License.