Training on Custom Video

Question

Training on Custom Video

juseonghan opened this issue 2 years ago · comments

Hello,

I am attempting to run training on my own endoscopy video sequence. I see that the given dataset already has groundtruth as well as hdf5 file for the data. In addition, the training code already assumes this data format. What would you recommend to do to run training and the system from my own image sequence? Thank you.

Xingtong Liu commented 2 years ago

Thanks!

Xingtong Liu · Answer 1 · Thu Aug 25 2022 00:27:19 GMT+0800 (China Standard Time)

Hi,

If you want to generate groundtruth data, any method that can create a reasonable dense depth map and a camera pose for each frame will work. The project I did to create the pseudo-groundtruth data provided in this repository is Reconstructing Sinus Anatomy from Endoscopic Video -- Towards a Radiation-free Approach for Quantitative Longitudinal Assessment (MICCAI 2020). That project is based on SfM with learning-based dense descriptor and volumetric depth fusion.

John Han · Answer 2 · Thu Aug 25 2022 00:34:46 GMT+0800 (China Standard Time)

Thanks for the prompt response. If you use SfM results for the SLAM system, does that mean it cannot ultimately be a real-time system? Or am I misunderstanding something?

Xingtong Liu · Answer 3 · Thu Aug 25 2022 00:54:41 GMT+0800 (China Standard Time)

The SfM results are only used for network training and the network can generalize to unseen sequences. Therefore, with enough data from SfM for training, it can perform on other new sequences in real time

John Han · Answer 4 · Fri Aug 26 2022 00:50:47 GMT+0800 (China Standard Time)

Thanks for the clarification. If I understand you correctly, the SfM results are used to train the network in the SLAM system, and the SLAM system can then give accurate results on unseen sequences. As a result, we don't need to run the SfM system on every new sequence.

I guess what I'm not understanding is how I can make my own video sequence into a format that is acceptable by the SLAM system. Is the SfM system not used to generate the input files to the SLAM system, such as the hdf5 and camera pose txtfile?

Xingtong Liu · Answer 5 · Fri Aug 26 2022 01:17:25 GMT+0800 (China Standard Time)

The SLAM system only uses the camera intrinsics and color image, image mask part of the hdf5 file. You can check the source code of the system folder to figure out the exact set of data the SLAM system needs to access during the running process. I was just lazy and put everything inside the hdf5 for the purpose of cross validation

John Han · Answer 6 · Fri Aug 26 2022 01:30:53 GMT+0800 (China Standard Time)

I understand, thank you. Is the image mask just a binary image where it is black in the edges where the endoscopy video is black? And just to confirm, the groundtruth camera pose textfile is not needed either for the SLAM system?

Xingtong Liu · Answer 7 · Fri Aug 26 2022 01:32:39 GMT+0800 (China Standard Time)

Anything that you would not expect the real scenario to have in advance of running SLAM is not needed. The mask is a binary one telling the system which part of region is black as you said.

John Han · Answer 8 · Fri Aug 26 2022 01:33:15 GMT+0800 (China Standard Time)

Thank you so much for the help. We can resolve the issue.