Relation Networks for Object Detection + Flow-Guided Feature Aggregation + Deformable Nets

Forked from Relation Networks for Object Detection with major contributors Dazhi Cheng, Jiayuan Gu, Han Hu and Zheng Zhang.

Joined with Flow-Guided Feature Aggregation (FGFA) with major contributors Yuqing Zhu, Shuhao Fu, and Xizhou Zhu, when they are interns at MSRA.

And with Deformable ConvNets with major contributors Yuwen Xiong, Haozhi Qi, Guodong Zhang, Yi Li, Jifeng Dai, Bin Xiao, Han Hu and Yichen Wei.

Introduction

Relation Networks for Object Detection is described in an CVPR 2018 oral paper.

Flow-Guided Feature Aggregation (FGFA) is described in an ICCV 2017 paper.

Deformable ConvNets is described in an ICCV 2017 oral paper.

Disclaimer

From the original Relation Networks README

This is an official implementation for Relation Networks for Object Detection based on MXNet. It is worth noting that:

This repository is tested on official MXNet v1.1.0@(commit 629bb6). You should be able to use it with any version of MXNET that contains required operators like Deformable Convolution.
We trained our model based on the ImageNet pre-trained ResNet-v1-101 using a model converter. The converted model produces slightly lower accuracy (Top-1 Error on ImageNet val: 24.0% v.s. 23.6%).
This repository is based on Deformable ConvNets.

Our modified code is tested on Ubuntu 16.04 with CUDA 9.1 and MXNet 1.2.1

License

Citing

If you find Relation Networks for Object Detection useful in your research, please consider citing:

@article{hu2017relation,
  title={Relation Networks for Object Detection},
  author={Hu, Han and Gu, Jiayuan and Zhang, Zheng and Dai, Jifeng and Wei, Yichen},
  journal={arXiv preprint arXiv:1711.11575},
  year={2017}
}

If you find Flow-Guided Feature Aggregation useful in your research, please consider citing:

@inproceedings{zhu17fgfa,
    Author = {Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei},
    Title = {Flow-Guided Feature Aggregation for Video Object Detection},
    Conference = {ICCV},
    Year = {2017}
}

@inproceedings{dai16rfcn,
    Author = {Jifeng Dai, Yi Li, Kaiming He, Jian Sun},
    Title = {{R-FCN}: Object Detection via Region-based Fully Convolutional Networks},
    Conference = {NIPS},
    Year = {2016}
}

If you find Deformable ConvNets useful in your research, please consider citing:

@article{dai17dcn,
    Author = {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei},
    Title = {Deformable Convolutional Networks},
    Journal = {arXiv preprint arXiv:1703.06211},
    Year = {2017}
}
@inproceedings{dai16rfcn,
    Author = {Jifeng Dai, Yi Li, Kaiming He, Jian Sun},
    Title = {{R-FCN}: Object Detection via Region-based Fully Convolutional Networks},
    Conference = {NIPS},
    Year = {2016}
}

Main Results

Faster RCNN

	_{training data}	_{testing data}	_mAP	_mAP@0.5	_mAP@0.75	_mAP@S	_mAP@M	_mAP@L	_{Inference Time}	_{Post Processing Time}
_{2FC + nms(0.5) ResNet-101}	_{coco trainval35k}	_{coco minival}	31.8	53.9	32.2	10.5	35.2	51.5	0.168s	0.025s
_{2FC + softnms(0.6) ResNet-101}	_{coco trainval35k}	_{coco minival}	32.3	52.8	34.1	11.1	35.9	51.8	0.200s	0.060s
_{2FC + Relation Module + softnms ResNet-101}	_{coco trainval35k}	_{coco minival}	34.7	55.3	37.2	13.7	38.8	53.6	0.211s	0.059s
_{2FC + Learn NMS ResNet-101}	_{coco trainval35k}	_{coco minival}	32.6	51.8	35.0	11.8	36.6	52.1	0.162s	0.020s
_{2FC + Relation Module + Learn NMS(e2e) ResNet-101}	_{coco trainval35k}	_{coco minival}	35.2	55.5	38.0	15.2	39.2	54.1	0.175s	0.022s

Deformable Faster RCNN

	_{training data}	_{testing data}	_mAP	_mAP@0.5	_mAP@0.75	_mAP@S	_mAP@M	_mAP@L	_{Inference Time}	_{NMS Time}
_{2FC + nms(0.5) ResNet-101}	_{coco trainval35k}	_{coco minival}	37.2	58.1	40.0	16.4	41.3	55.5	0.180s	0.022s
_{2FC + softnms(0.6) ResNet-101}	_{coco trainval35k}	_{coco minival}	37.5	57.3	41.0	16.6	41.7	55.8	0.208s	0.052s
_{2FC + Relation Module + Learn NMS(e2e) ResNet-101}	_{coco trainval35k}	_{coco minival}	38.4	57.6	41.6	18.2	43.1	56.6	0.188s	0.023s

FPN

	_{training data}	_{testing data}	_mAP	_mAP@0.5	_mAP@0.75	_mAP@S	_mAP@M	_mAP@L	_{Inference Time}	_{NMS Time}
_{2FC + nms(0.5) ResNet-101}	_{coco trainval35k}	_{coco minival}	36.6	59.3	39.3	20.3	40.5	49.4	0.196s	0.037s
_{2FC + softnms(0.6) ResNet-101}	_{coco trainval35k}	_{coco minival}	36.8	57.8	40.7	20.4	40.8	49.7	0.323s	0.167s
_{2FC + Relation Module + Learn NMS(e2e) ResNet-101}	_{coco trainval35k}	_{coco minival}	38.6	59.9	43.0	22.1	42.3	52.8	0.232s	0.022s

Running time is counted on a single Maxwell Titan X GPU (mini-batch size is 1 in inference).

Requirements: Software

MXNet from the offical repository. We tested our code on MXNet 1.2.1. Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues.
Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work.
The following Python packages:

Cython
EasyDict
mxnet-cu91 # changed from mxnet-cu80 used in relation networks code
opencv-python

Requirements: Hardware

Any NVIDIA GPUs with at least 6GB memory should be OK.

Installation

Clone the repository.

git clone https://github.com/HaydenFaulkner/Relation-Networks-for-Object-Detection-Video.git
cd Relation-Networks-for-Object-Detection-Video

Run sh ./init.sh. The scripts will build cython module automatically and create some folders.
Install MXNet:

Quick start

3.1 Install MXNet and all dependencies by

pip install -r requirements.txt

If there is no other error message, MXNet should be installed successfully.

If you get an error about not finding libcudart.so even after having your environment variables set, try running (with the correct paths):

sudo sh -c "echo '/usr/local/cuda/lib64\n/usr/local/cuda/lib' >> /etc/ld.so.conf.d/nvidia.conf"
sudo ldconfig

Build from source (alternative way)

3.2 Clone MXNet v1.1.0 by

git clone -b v1.1.0 --recursive https://github.com/apache/incubator-mxnet.git

3.3 Compile MXNet

cd ${MXNET_ROOT}
make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1

3.4 Install the MXNet Python binding by

Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4

cd python
sudo python setup.py install

3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE)/mxnet, and modify MXNET_VERSION in ./experiments/relation_rcnn/cfgs/*.yaml to $(YOUR_MXNET_PACKAGE). Thus you can switch among different versions of MXNet quickly.

Make sure the correct cuda is on your LD_LIBRARY_PATH

Preparation for Training & Testing

Please download the datasets, and use the following structure:

1.1 MSCOCO 2017 (18 + 1 + 6 + .241 GB)

./data/coco/

1.2 ImageNetDET 2015 (47 + .015 + .0014 GB) (unchanged from 2014 data), ImageNetLOC 2015 (160 GB) (from Kaggle) and ImageNetVID 2015 (86GB)

./data/ILSVRC2015/
./data/ILSVRC2015/Annotations/DET
./data/ILSVRC2015/Annotations/LOC
./data/ILSVRC2015/Annotations/VID
./data/ILSVRC2015/Data/DET
./data/ILSVRC2015/Data/LOC
./data/ILSVRC2015/Data/VID
./data/ILSVRC2015/ImageSets

1.3 PascalVOC 2007 (.439 GB) and PascalVOC 2012 (2 GB)

./data/VOCdevkit/VOC2007/
./data/VOCdevkit/VOC2012/

Please download ImageNet-pretrained ResNet-v1-101 backbone model and Faster RCNN ResNet-v1-101 model manually from Relation Backbone OneDrive, and put it under folder ./model/relation/pretrained_model. Make sure it looks like this:
```
./models/backbones/resnet_v1_101-0000.params
```
We use a pretrained Faster RCNN and fix its params when training Faster RCNN with Learn NMS head. If you are trying to conduct such experiments, please also include the pretrained Faster RCNN model from OneDrive, making sure it looks like this:
```
./models/relation/pretrained/coco_resnet_v1_101_rcnn-0008.params
```
For FPN related experiments, we use proposals generated by a pretrained RPN to speed up our experiments. Please download the proposals from Proposals OneDrive and put it under folder ./proposal/resnet_v1_101_fpn/rpn_data. Make sure it looks like this:
```
./proposal/resnet_v1_101_fpn/rpn_data/COCO_minival2014_rpn.pkl
./proposal/resnet_v1_101_fpn/rpn_data/COCO_train2014_rpn.pkl
./proposal/resnet_v1_101_fpn/rpn_data/COCO_valminusminival2014_rpn.pkl
```
Download the FGFA Flying-Chairs pre-trained backbone FlowNet model from FGFA Backbone OneDrive, and make sure it looks like this:
```
./models/backbones/flownet-0000.params
```
You can delete the resnet_v1_101-0000.params downloaded here as it is a duplicate that we downloaded in step 2.

Demo Models

Provided are trained models for each of the problems.