Inference

Question

Inference

AbbosAbdullayev opened this issue 2 years ago · comments

abo-galaxy commented 2 years ago

abo-galaxy · Answer 1 · Thu Jun 30 2022 10:06:20 GMT+0800 (China Standard Time)

Hello thanks for good work
How can we inference custom image or video input?

Lojze Žust · Answer 2 · Thu Jun 30 2022 17:58:49 GMT+0800 (China Standard Time)

Hello! This repository contains the code for the evaluation of predictions on the MODS benchmark.

If you wish to make predictions (inference) you will probably want to check out the WaSR network: https://github.com/lojzezust/WaSR

bborja · Answer 3 · Thu Jun 30 2022 21:13:01 GMT+0800 (China Standard Time)

Hello thanks for good work How can we inference custom image or video input?

Hello.

It depends on what you want to achieve.

If you wish to use our evaluation protocol on your own image sequences, then this will require quite some work. First take a look into gt_data to see what format of ground truth our protocol expects. You should store your ground truth in the same format. Besides ground truth data, we also expect each image to be synchronized and paired with IMU measurements. This allow us to compute the required horizon masks, as well as to estimate the location of the danger zone. Additionally, each image sequence should be provided with a calibration file, where camera matrix, etc. are stored.
If you wish to run WaSR inference on your sequence of images then this is easier. For instructions please take a look at Section 3 of https://github.com/bborja/wasr_network. Note, that in order for WaSR to produce the best results you also need to provide a horizon mask for each image. Alternatively, you can use the "NO IMU" variation, which scores slightly lower and only requires RGB images. The original WaSR that I have linked in this post is implemented in TensorFlow (version 1.2, which is quite old now). However, a PyTorch re-implementation is also available on the link that @lojzezust shared (https://github.com/lojzezust/WaSR).

abo-galaxy · Answer 4 · Thu Jun 30 2022 21:34:12 GMT+0800 (China Standard Time)

Thanks for the quick response!!!
I got initial results on the PyTorch implementation version for only a few images with IMU variations, but the input image size is fixed and the input video file is not allowed, I tried to use the 'No Imu" version, but the model tries to find Imu variation of each image. My goal is to use the model for video inferencing taken by USV. Can you give advice on better inferencing?
Thanks lot

Lojze Žust · Answer 5 · Thu Jun 30 2022 23:51:10 GMT+0800 (China Standard Time)

If you wish to use the IMU version, you need to also provide horizon masks estimated from the IMU as is done in the examples directory.

If you don't have access to IMU, you can also use a version without IMU. In this case, you have to select the correct architecture using --model (wasr_resnet101 instead of wasr_resnet101_imu) and use the correct weights.

To process a video, you will have to do one of the following things:
a) Convert the video to image frames and then run WaSR on them, or
b) Integrate WaSR into your pipeline. I suggest starting from the predict_single.py script, which shows how to run a prediction on a single image.

bborja · Answer 6 · Fri Jul 01 2022 00:09:33 GMT+0800 (China Standard Time)

Thanks for the quick response!!! I got initial results on the PyTorch implementation version for only a few images with IMU variations, but the input image size is fixed and the input video file is not allowed, I tried to use the 'No Imu" version, but the model tries to find Imu variation of each image. My goal is to use the model for video inferencing taken by USV. Can you give advice on better inferencing? Thanks lot

Unfortunately our networks support only images as an input. I would suggest extracting images from the videos or video streams using e.g. https://docs.opencv.org/3.4/d8/dfe/classcv_1_1VideoCapture.html and then running WaSR (predict_single.py) inference on these images. However, this might be slow, since the model will be loaded for each image. To address this, I suggest modifying the following lines https://github.com/lojzezust/WaSR/blob/ae968972dbf4a3666eec2218225c486878f10754/predict_single.py#L66-L94 to read in a loop images from your stream and compute their predictions.

I am now closing this issue as it is unrelated to the MODS benchmark. If you have further questions, please open a new issue on the correct (WaSR) repository.