Banus/caffe-demo

Caffe demo

This is a showcase of a small collection of neural networks implemented in Caffe, in order to evaluate their speed and accuracy on videos and images. A webcam interface is also available for live demos.

Demo interface. Image classification with SqueezeNet. Jaguar image by CaptainGiraffehands on Imgur.

Quick start

Download all the models:

cd models && ./download_models.sh all
cd ..

Try the classification interface with SqueezeNet [1]:

python deep_classification.py images/jaguar.jpg squeezenet

Try classification on webcam with GoogleNet:

python deep_classification.py webcam googlenet

Try the detection interface (CoCo classes) with the Yolo-tiny model:

python deep_classification.py images/giraffe.jpg tiny_yolo

And the detection interface with Pascal VOC classes:

python deep_classification.py images/dog.jpg tiny_yolo_voc

Detection interface. Object detection with YOLO-tiny. Image from Darknet by Joseph Chet Redmon.

Installation

To run the demos, you need to compile the toolbox Caffe with Python support; follow the instructions on the BVLC website. You both Python 2.7 and Python 3.5 distributions are supported.

You need also to install the Python interfaces for OpenCV; on Ubuntu:

sudo apt-get install python-opencv

The script searches the current location of the Caffe toolbox in the environmental variable CAFFE_ROOT. Set it to your Caffe installation, e.g:

export CAFFET_ROOT=${HOME}/caffe

You can download automatically all the networks supported by the demo running the following script (beware: it will download around 500 MB):

cd models && ./download_models.sh all

You can also download only the models you plan to use by passing them as parameters, e.g.

./download_models.sh caffenet squeezenet

YOLO models cannot be automatically downloaded from Google Drive and thus you have to download them manually in the models/yolo path. The links to the network weights in Caffe format are here:

tiny_yolo (CoCo classes)
tiny_yolo_voc (Pascal VOC classes)
darknet (ImageNet 1k)
tiny (ImageNet 1k)

Windows

You can run the demos in Windows by installing one of the Caffe for Windows pre-built binaries.

To install the script dependences, such as OpenCV, Anaconda is suggested. If you use Python 3, Caffe currently supports only the version 3.5, so you may have to create a virtual environment and activate it. Type in the command or in the Anaconda console (not Powershell as it has issues with Anaconda environments) if you are using Python 3:

conda create -n caffe python=3.5
activate caffe
conda install scikit-image
conda install -c conda-forge py-opencv protobuf

An Anaconda environment file is avalable in caffe-env.yml. Create the environent with:

conda create -n caffe -f caffe-env.yml

Set CAFFE_ROOT to point to the directory where you unpacked the Caffe distribution.

To download the models, use the Git bash shell:

cd models && sh download_models.sh all

The YOLO models will need to be downloaded manually.

Configuration file

The supported networks are specified in a network.ini configuration file in the same directory of the script. Each section of the configuration file specifies a supported network and you can add new sections or modify the existing ones to support your Caffe networks.

Each section has the form:

[network_name]
type = class
model = path_to_caffe_prototxt
weights = path_to_caffemodel
labels = path_to_list_of_dataset_labels
mean = mean_pixel
anchors = list of floats

The parameter type specifies the kind of network to load; as for now, the supported types are:

class: classification network with a n-way softmax at the last layer named prob
class_yolo: classification network from Darknet, with different pixel scaling and center crop
yolo_detect: YOLO detection network [2] where the last layer specifies at once detected classes, regressed bounding boxes and box confidence

The parameters model and weights point to the Caffe files required to load the network structure (.prototxt) and weights (.caffemodel). All the paths are relative to the configuration file.

The labels parameter points to a file with the name of the recognized classes in the order expected by the model. Currently are available the classes for the ImageNet, Places250, PascalVOC and MSCoCo datasets.

The optional mean parameter specifies the mean pixel value for the dataset as a triple of byte values in BGR format. If the mean is not available, the mean of the input image is used instead.

The optional anchors parameter specifies the bounding box biases in Darknet v2 detection networks and it has to be manuall copied from the corresponding .cfg file.

CPU mode: an additional parameter device in the DEFAULT section specifies if the CPU or the GPU (default) should be used for the model. You can override the parameter for a specific network by specifying a different device in its section.

References

[1] Iandola, F. N., Moskewicz, M. W., Ashraf, et al. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. Arxiv.

[2] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection. Arxiv.

License

The code is released under the BDS 2-clause license, except for the yolo_detector module which under the Darknet license (public domain).

Banus / caffe-demo