Optimize lane detection execution time

Question

Optimize lane detection execution time

thomasfermi opened this issue 3 years ago · comments

Mario Theers commented 3 years ago

The lane detector in this book is a bit too slow. This leads to Carla simulations which are not 30fps.

Ideas for improvements:

train/run a lane detector on resized images (512 x 256 instead of 1024 x 512). Not sure if accuracy would still be good enough. Would need to check
use some neural network made for fast semantic segmentation (e.g. ENet or Fast-SCNN). Downside is that this is then not as easy to implement for readers as an exercise... Would be nice if these architectures were supported by the segmentation models pytorch library

MankaranSingh · Answer 1 · Sat Jun 12 2021 20:15:54 GMT+0800 (China Standard Time)

I think to improve the performance one shouldnt be using pytorch for inference. onnx runtime will easily give 2-3x performance boost but on the downside, while it's easy to install and convert model to onnx, it can take attention of reader away from the main goal.

Mario Theers · Answer 2 · Mon Jun 14 2021 04:08:06 GMT+0800 (China Standard Time)

Hey @MankaranSingh , thanks for the input, I did not know about onnx!

Regarding "taking away attention": That is a very good point. One could think about adding a lane_detector_onnx.py as a sibling file to lane_detector.py in the solutions directory. Also one would need to store a .onxx file in the repo. Then students could optionally install onnx and run the lane detector from lane_detector_onnx.py. It could still be a small distraction, but if it would allow for real-time execution of the sample solution lane detector in Carla, it might be worth it :)

From what I googled it might be easy to do this using torch.onnx. I might give this a go next week if I find some time. Just to find out how much the speed up will be :)

MankaranSingh · Answer 3 · Mon Jun 14 2021 15:37:58 GMT+0800 (China Standard Time)

I tried converting the best_model_multi_dice_loss.pth to onnx but it contains a custom op named SwishImplementation that causes errors when converting. this is an issue I found.

but anyways, I tried converting efficientnet-b0 encoder to onnx with the given solution and the fps improved from 0.2 to 2.5 on CPU. this seems promising. but we may have to retrain the model by setting memory efficient swish to false as shown in the issue.

Mario Theers · Answer 4 · Mon Jun 14 2021 20:52:13 GMT+0800 (China Standard Time)

Hey @MankaranSingh , thanks for looking into this! I tried

model.encoder.set_swish(memory_efficient=False)

and it worked :) Will try to load that onnx file into the onnxruntime this evening :)

Btw: Retraining was not necessary. I just loaded the pth file.

MankaranSingh · Answer 5 · Mon Jun 14 2021 21:04:38 GMT+0800 (China Standard Time)

ohh, I ttried model.set_swish(memory_efficient=False) and thought this method was only available fot models in smp. Godd to see that.

You can use onnxruntime-gpu with these settings:

import onnxruntime as ort

options = ort.SessionOptions()
options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
provider = 'CUDAExecutionProvider'
ort_session = ort.InferenceSession('model.onnx', options)
ort_session.set_providers([provider], None)

img = np.random.randn(img_shape)
outs = ort_session.run(None, img)[0]

Mario Theers · Answer 6 · Tue Jun 15 2021 02:58:06 GMT+0800 (China Standard Time)

Hi @MankaranSingh , I had to install CudaToolkit and cuDNN to get the onnxruntime to work on my machine. Sadly the inference time (+polyfitting but this is not the bottleneck) with onnxruntime is 164ms, while inference time with pytorch is 60ms. So it actually gets slower :(
It seems that onnxruntime is not a plug and play accelerator...

Mario Theers · Answer 7 · Sun Aug 08 2021 18:30:56 GMT+0800 (China Standard Time)

update: I have some local code with some updates that I will commit soon. For segmentation I am using the fastseg library and MobileV3Small, which is a bit faster than the current smpmodel

EDIT: commit bb090ad introduced the changes to the CameraCalibrationDev branch. Closing this issue for now