lppllppl920 / SAGE-SLAM

Official repo for the paper "SAGE: SLAM with Appearance and Geometry Prior for Endoscopy" (ICRA 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training on Custom Video

juseonghan opened this issue · comments

Hello,

I am attempting to run training on my own endoscopy video sequence. I see that the given dataset already has groundtruth as well as hdf5 file for the data. In addition, the training code already assumes this data format. What would you recommend to do to run training and the system from my own image sequence? Thank you.

Hi,

If you want to generate groundtruth data, any method that can create a reasonable dense depth map and a camera pose for each frame will work. The project I did to create the pseudo-groundtruth data provided in this repository is Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment (MICCAI 2020). That project is based on SfM with learning-based dense descriptor and volumetric depth fusion.

Thanks for the prompt response. If you use SfM results for the SLAM system, does that mean it cannot ultimately be a real-time system? Or am I misunderstanding something?

The SfM results are only used for network training and the network can generalize to unseen sequences. Therefore, with enough data from SfM for training, it can perform on other new sequences in real time

Thanks for the clarification. If I understand you correctly, the SfM results are used to train the network in the SLAM system, and the SLAM system can then give accurate results on unseen sequences. As a result, we don't need to run the SfM system on every new sequence.

I guess what I'm not understanding is how I can make my own video sequence into a format that is acceptable by the SLAM system. Is the SfM system not used to generate the input files to the SLAM system, such as the hdf5 and camera pose txtfile?

The SLAM system only uses the camera intrinsics and color image, image mask part of the hdf5 file. You can check the source code of the system folder to figure out the exact set of data the SLAM system needs to access during the running process. I was just lazy and put everything inside the hdf5 for the purpose of cross validation

I understand, thank you. Is the image mask just a binary image where it is black in the edges where the endoscopy video is black? And just to confirm, the groundtruth camera pose textfile is not needed either for the SLAM system?

Anything that you would not expect the real scenario to have in advance of running SLAM is not needed. The mask is a binary one telling the system which part of region is black as you said.

Thank you so much for the help. We can resolve the issue.

Thanks!