pedestrian-detection caffe caltech-pedestrian-dataset deep-learning single-shot-multibox-detector single-shot-detection ssd

SSD: Single Shot MultiBox Detector On Caltech Pedestrian Dataset

Introduction

In this work we apply Single Shot Multibox Detector SSD on Caltech Pedestrian Dataset. In addition to caltech dataset, we also used ETH pedestrian dataset and TUDBrussels dataset. We are also finetuning from SSD512 model trained on 07++12+COCO. We were able to reach state-of-art results while having a real-time speed.

Results are shown below:

Model	Overall miss-rate	Reasonable miss-rate	FPS (Geforce GTX Titan X)	Input resolution
SSD512 (VGG16) (training from scratch + no hyper-parameters optimization)	65.17%	20.32%	22	640 x 480
SSD512 (VGG16)	54.44%	11.89%	24	512 x 512
SSD640 (VGG16)	53.11%	11.85%	20	640 x 480
F-DNN	50.5%	8.65%	6.25	640 x 480

Fixed-Point 16-bit Quantization

We also worked on quantizing the model to dynamic 16-bit Fixed Point using caffe ristretto. The script to test the quantized model is available under ssd-ristretto branch by going to models/VGGNet/caltech/SSD_512x512_ft_quantized (Caffe ristretto doesn't require changing the .caffemodel file for quantization, only the .prototxt file is modified). The model performance decreased by less than 0.01%. We do not report the speed on the Quantized model, Ristretto simulates the 16-bit fixed point arithmetic using floating point arithmetic because there's no hardware support for fixed point arithmetic on the GPU, but with hardware support we expect the model to be faster.

Model	Overall miss-rate	Reasonable miss-rate
SSD512 (VGG16) not quantized	54.4362%	11.8868%
SSD512 (VGG16) quantized	54.4374%	11.8937%

Citing

Please cite this paper in your publications if it helps your research:

@inproceedings{feasac2017ssdc,
  title = {An FPGA-Accelerated Design for Deep Learning Pedestrian Detection in Self-Driving Vehicles},
  author = {Moussawi, Abdallah and Haddad, Kamal and Chahine, Anthony},
  booktitle = {FEASAC},
  year = {2017}
}

and of course, please cite the great work done by Wei Liu et. al:

@inproceedings{liu2016ssd,
  title = {{SSD}: Single Shot MultiBox Detector},
  author = {Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C.},
  booktitle = {ECCV},
  year = {2016}
}

Installation

Get the code. We will call the directory that you cloned Caffe into $CAFFE_ROOT

git clone https://github.com/amoussawi/caffe.git
cd caffe
git checkout ssd

Build the code. Please follow Caffe instruction to install all necessary packages and build it.

# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
make test -j8
# (Optional)
make runtest -j8

Preparation

examples/ssd/ contains python scripts to train with two initializations: with finetuning (ends with _ft) and without finetuning.

Download fully convolutional reduced (atrous) VGGNet If you want to start training from scratch, and SSD512 07++12+COCO if you want to finetune. Atrous VGGNet should be stored in $CAFFE_ROOT/models/VGGNet/. And pretrained SSD512 model should be stored in $CAFFE_ROOT/models/VGGNet/VOC0712Plus/SSD_512x512/
Download Caltech, ETH, and TUDBrussels pedestrian datasets from Caltech. By default, we assume the data is stored in $HOME/data/caltech_code/
You need Matlab in order to use the caltech evaluation code. The code is available in data/caltech/caltech_code/. We extract 1 frame every 5 frames from caltech training dataset, and all frames of ETH and TUDBrussels datasets. (we also used the external ETH car dataset here consisting of ~2100 pedestrians, though we don't think it made a huge difference, so you may not need it). To extract datasets, you need to run extractDatasets.m matlab script in data/caltech/caltech_code. This will extract the datasets into ../trainval/ and ../test/ accordingly. If you want to extract more images from caltech dataset, just set the skip variable of usatrain in dbInfo.m accordingly.
Create the LMDB file.

cd $CAFFE_ROOT
# Create the trainval.txt, test.txt, and test_name_size.txt in data/caltech/
./data/caltech/create_list.sh
# It will create lmdb files for trainval and test with encoded original image:
#   - $CAFFE_ROOT/data/caltech/caltech_trainval_lmdb
#   - $CAFFE_ROOT/data/caltech/caltech_test_lmdb
# and make soft links at examples/caltech/
./data/caltech/create_data.sh

Train/Eval

Train your model and evaluate the model on the fly.

# It will create model definition files and save snapshot models in:
#   - $CAFFE_ROOT/models/VGGNet/caltech/SSD_512x512/
# and job file, log file, and the python script in:
#   - $CAFFE_ROOT/jobs/VGGNet/caltech/SSD_512x512/
# and save temporary evaluation results in:
#   - $CAFFE_ROOT/examples/results/SSD_512x512/
# It should reach 11.8* % at 20k iterations.
python examples/ssd/ssd_caltech_512_ft.py

Evaluate the most recent snapshot.

# If you would like to test a model you trained, you can do:
python examples/ssd/score_ssd_caltech_ft.py

Test your model using a webcam. Note: press esc to stop.

# If you would like to attach a webcam to a model you trained, you can do:
python examples/ssd/ssd_caltech_webcam_ft.py

Here is a demo video of running a SSD512 model on a video of a car driving in the streets of Beirut.

Models

SSD512 Caltech

About

Single Shot Multibox Detector on Caltech pedestrian dataset

pedestrian-detection caffe caltech-pedestrian-dataset deep-learning single-shot-multibox-detector single-shot-detection ssd

Other

Languages

Language:C++ 59.0%Language:MATLAB 14.1%Language:HTML 11.9%Language:Python 6.4%Language:Cuda 4.3%Language:CMake 1.7%Language:Protocol Buffer 1.3%Language:C 0.6%Language:Makefile 0.4%Language:Shell 0.3%Language:CSS 0.0%Language:M 0.0%

amoussawi / caffe