1. lane detector with front camera

PINet based lane detector for various road types and lane directions, trained with tusimple, culane, and bdd100k.

This repo is based on the paper Key Points Estimation and Point Instance Segmentation Approach for Lane Detection, and the original implementation in project PINet.

My related works including

retrained the network with tusimple, culane and BDD100K, significantly extended the performance for various road types and directions. original work uses only tusimple.
rewrite projects
some detailed discussion about the original implementation.
make this model compatible for torch JIT compile
use libtorch and container to deploy this model with 0.03-0.04 second per frame on laptop with NVidia GTX 1080

My results:

from test car, link
culane day and night

2. PINet

The key concept from original PINet paper is not doing pixel segmentation but key points detection with anchor based methods. This idea saved a lot of post-processing works compared with pixel based semantic segmentation, and more robust.

2.1 Architecture

first pass the input image to a resize module
then pass to two hourglasses one by one
output of each hourglass will be used for loss calculation, such as intermediate supervision

2.2 Bottleneck block

Bottleneck block in PINet is a residual block.

There are many discussion about different residual block layout. Some typical layout has been analysed in this paper Identity Mappings in Deep Residual Networks.

original Stacked Hourglass Networks used (e) full pre-activation
PINet uses (c) relu before addition
discussion in Identity Mappings in Deep Residual Networks
- for shallow network, the influence is not significant between (c) and (e), otherwise recommend (e)
- denote y1 = h(x1) + F(x1), x2 = f(y1) where h() is shortcut, F() is residual functions, f() is activation
- recommendation 1: use h() as identity mapping, i.e. h(x1)=x1
- problem of (c), ReLU as last step in residual is always positive and x2= x1 + F(x1), thus x2 is always increasing.
about Layer normalization, Transformers without Tears:Improving the Normalization of Self-Attention, link

2.3 Loss function

PINet has 3 branches output. The confidence and offset is similar to other anchor based detection algorithm. The instance branch mapping keypoints from different lanes into a high dimensional feature space. The output looks like a similarity matrix.

Note, in PINet, the network only learned how to use different feature vectors to distinguish keypoints from different lanes. So we need an extra clustering step to determine the real groups(lanes) for the keypoints based on these feature vectors. From the original paper, the author used mean-shift clustering for this step. The inference is really fast due to small network size, but the current clustering outside network can time consuming, which can be probabliy imporved later.

2.4 Training

I extended the training to tusimple, culane and BDD100k. I did not dive deep into tuning, data augmentation and so on for now. Maybe next kaggle event, if anyone want to team up.

below is an examplar validation log with tusimple rules.

3. Usage

test on culane, result video for day and night

python -m scripts.predict_pinet on_folder --path="{culane_root}/driver_182_30frame/05312333_0003.MP4" --save=True

test on tusimple, result video for testset

python -m scripts.predict_pinet on_folder --path="{tusimple_root}/lane_detection/test_set/clips/0530" --save=True --recursive=True

Repository Author

Zhiliang Zhou, Linkedin

License

The project is derived from PINet

The PINet code is licensed under MIT license.

masszhou / lane_detector