Pytorch Implementation of Faster R-CNN for multi-modal images (up to 5 channels)

Introduction

This project is a pytorch implementation of a Faster R-CNN for fruit detection suitable with multi-modal images (up to 5 channels). It's based on implementation of:

jwyang/faster_rcnn.pytorch, developed based on Pytorch + Numpy

This implementation has been used to train and test the KFuji RGB-DS dataset, which contains images with 3 different modalities: colour (RGB), depth(D), and range-corrected intensity signal (S). Find more information in:

Multi-modal Deep Learning for Fruit Detection Using RGB-D Cameras and their Radiometric Capabilities.

Preparation

First of all, clone the code

git clone https://github.com/GRAP-UdL-AT/RGBD_fruit_detection_faster-rcnn.pytorch.git

Then, create a folder:

cd kinect_fruit_detection_faster-rcnn.pytorch && mkdir data

In the data folder is where datasets and models must be stored

Prerequisites

Python 2.7
Pytorch 0.2.0
CUDA 8.0 or higher

Data Preparation

KFuji RGB-DS dataset: Save the [KFuji RGB-DS dataset] (http://www.grap.udl.cat/en/publications/datasets.html) in data/kinect_fruits_dataset folder. If data is anotated using [Pychet Labeller] (https://github.com/imatge-upc/pychetlabeller), it is necessary to execute square_annot_from_pychet_rectangle.py.

Pretrained Model

We used VGG pretrained model in our experiments. You can download this model from:

VGG16: Dropbox, VT Server

Download and put it into the data/kinect_fruits_models/.

NOTE. That is not a faster-rcnn pretrained model, is just a pretrained VGG16 model to start to train "from scratch" the faster_rcnn part

Compilation

As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch in lib/make.sh file, to compile the cuda code:

GPU model	Architecture
TitanX (Maxwell/Pascal)	sm_52
GTX 960M	sm_50
GTX 1080 (Ti)	sm_61
Grid K520 (AWS g2.2xlarge)	sm_30
Tesla K80 (AWS p2.xlarge)	sm_37

More details about setting the architecture can be found here or here

For selecting the gpu architecture there are examples in makesh_examples/ so copy that as lib/make.sh

Information of the GPUs architecture of the imatge server is at: imatge.upc.edu information

Install all the python dependencies using pip:

pip install -r requirements.txt

Compile the cuda dependencies using following simple commands:

cd lib
srun --gres=gpu:pascal:1,gmem:6G --mem 12G sh make.sh
or
srun --gres=gpu:maxwell:1,gmem:6G --mem 12G sh make.sh

It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.

IMPORTANT NOTE The srun --gres... until python script execution has to be the same of what used in compilation of make.sh

Train/val

Execute trainval to do the train and validation in the same script

srun --gres:$architecture:1,gmem:6G --mem 30G -c 2 python trainval_net.py \
              --dataset kinect_fruits --net vgg16_5ch \
              --bs $BATCH_SIZE --lr $LEARNING_RATE  \
              --lr_decay_step $DECAY_STEP --RGB --NIR --DEPTH  \
		   --epochs $NUM_EPOCHS --o $OPTIMIZER \
		   --s $SESSION --anchor $ANCHOR_SCALE --cuda

example:

srun --gres=gpu:1,gmem:10G --mem 30G -c 2 python trainval_net.py --dataset kinect_fruits_k --net vgg16_5ch --bs 4  --lr 0.0001 --lr_decay_step 10 --RGB --NIR  --DEPTH  --epochs 45 --o adam --s 60  --anchor 4  --anchor 8  --anchor 16  --cuda

These script only compute the loss

Test

If you want to evlauate the detection performance of a pre-trained faster-rcnn model on kinect_fruits test set, simply run

srun --gres:$architecture:1,gmem:6G --mem 30G python test_net.py\
		   --dataset kinect_fruits --net vgg16_5ch \
		   --RGB --DEPTH --NIR --cheksession $SESSION\
		   --checkpoint $POINT --anchor $ANCHOR_SCALE \
		   --ovthresh $minIOU --minconfid $minCONFIDENCE\
		   --cuda

example:

srun --gres=gpu:1,gmem:10G --mem 30G -c 2 python test_net.py --dataset kinect_fruits_k --net vgg16_5ch  --RGB  --DEPTH  --NIR   --checksession 60 --checkpoint 309  --anchor 4  --anchor 8  --anchor 16   --ovthresh 0.2 --minconfid 0.4 --minconfid 0.45 --minconfid 0.5  --minconfid 0.55  --minconfid 0.6 --minconfid 0.65 --minconfid 0.7   --cuda

This script computes mean average precision, precision, recall, F1-score and the number of inferred images per second.

Demo

If you want to run detection on your own images with a pre-trained model, download the pretrained model or train your own models at first, then add images to folder $ROOT/images_kinect_fruits, and then run

srun --gres=gpu:1,gmem:10G --mem 30G -c 2 python demo.py \
		   --dataset kinect fruits --net vgg16_5ch \
		   --RGB --DEPTH --NIR --cheksession $SESSION\
		   --checkpoint $POINT --checkepoch $epoch \
		   --anchor $ANCHOR_SCALE --minconfid $minCONFIDENCE\
		   --image_dir images_kinect_fruits --cuda

example:

srun --gres=gpu:1,gmem:10G --mem 30G -c 2 python demo.py --dataset kinect_fruits --net vgg16_5ch  --RGB  --DEPTH  --NIR   --checksession 42 --checkpoint 309  --checkepoch 12  --anchor 4  --anchor 8  --anchor 16    --minconfid 0.6  --image_dir images_kinect_fruits  --cuda

Then you will find the detection results in folder $ROOT/images_kinect_fruits.

Authorship

This project is contributed by GRAP-UdL-AT and ImageProcessingGroup-UPC and it is based on the implementation of Jianwei Yang.

Please contact authors to report bugs @ j.gene@eagrof.udl.cat

Citation

If you find this implementation or the analysis conducted in our report helpful, please consider citing:

@article{Gene-Mola2019,
    author = {Gen{\'e}-Mola, Jordi and Vilaplana, Ver{\'o}nica and Rosell-Polo, Joan R and Morros, Josep-Ramon and Ruiz-Hidalgo, Javier  and Gregorio, Eduard},
    title = {Multi-modal Deep Learning for Fruit Detection Using RGB-D Cameras and their Radiometric Capabilities},
    journal={Computers and Electronics in Agriculture},
volume={162},
pages={689--698},
year={2019},
    publisher={Elsevier}
}

For convenience, here is the Faster RCNN citation:

@inproceedings{renNIPS15fasterrcnn,
    Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
    Title = {Faster {R-CNN}: Towards Real-Time Object Detection
             with Region Proposal Networks},
    Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
    Year = {2015}
}

zzj506506 / RGBD_fruit_detection_faster-rcnn.pytorch

Pytorch Implementation of Faster R-CNN for multi-modal images (up to 5 channels)

Introduction

Preparation

Prerequisites

Data Preparation

Pretrained Model

Compilation

Train/val

Test

Demo

Authorship

Citation

About

Languages