pytorch fpn deep-learning object-detection voc-dataset mscoco-dataset resnet-18 resnet-50 resnet-101

easy-fpn.pytorch

An easy implementation of FPN in PyTorch based on our easy-faster-rcnn.pytorch project.

Demo

Features

Supports PyTorch 0.4.1
Supports PASCAL VOC 2007 and MS COCO 2017 datasets
Supports ResNet-18, ResNet-50 and ResNet-101 backbones (from official PyTorch model)
Supports ROI Pooling and ROI Align pooling modes
Matches the performance reported by the original paper
It's efficient with maintainable, readable and clean code

Benchmarking

PASCAL VOC 2007

Train: 2007 trainval (5011 images)
Eval: 2007 test (4952 images)

Implementation	Backbone	GPU	Training Speed (FPS)	Inference Speed (FPS)	mAP	image_min_side	image_max_side	anchor_ratios	anchor_scales	pooling_mode	rpn_pre_nms_top_n (train)	rpn_post_nms_top_n (train)	rpn_pre_nms_top_n (eval)	rpn_post_nms_top_n (eval)	learning_rate	momentum	weight_decay	step_lr_size	step_lr_gamma	num_steps_to_finish
Ours	ResNet-101	GTX 1080 Ti	~ 3.3	~ 9.5	0.7627\|0.7604 (60k\|70k)	800	1333	[(1, 2), (1, 1), (2, 1)]	[1]	align	12000	2000	6000	1000	0.001	0.9	0.0001	50000	0.1	70000

Scroll to right for more configurations

MS COCO 2017

Train: 2017 Train drops images without any objects (117266 images)
Eval: 2017 Val drops images without any objects (4952 images)

Implementation	Backbone	GPU	Training Speed (FPS)	Inference Speed (FPS)	AP@[.5:.95]	image_min_side	image_max_side	anchor_ratios	anchor_scales	pooling_mode	rpn_pre_nms_top_n (train)	rpn_post_nms_top_n (train)	rpn_pre_nms_top_n (eval)	rpn_post_nms_top_n (eval)	learning_rate	momentum	weight_decay	step_lr_size	step_lr_gamma	num_steps_to_finish
Original Paper	ResNet-101	-	-	-	0.362	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
Ours	ResNet-101	GTX 1080 Ti	~ 3.3	~ 9.5	0.363	800	1333	[(1, 2), (1, 1), (2, 1)]	[1]	align	12000	2000	6000	1000	0.001	0.9	0.0001	900000	0.1	1640000

Scroll to right for more configurations

PASCAL VOC 2007 Cat Dog
- Train: 2007 trainval drops categories other than cat and dog (750 images)
- Eval: 2007 test drops categories other than cat and dog (728 images)
MS COCO 2017 Person
- Train: 2017 Train drops categories other than person (64115 images)
- Eval: 2017 Val drops categories other than person (2693 images)
MS COCO 2017 Car
- Train: 2017 Train drops categories other than car (12251 images)
- Eval: 2017 Val drops categories other than car (535 images)
MS COCO 2017 Animal
- Train: 2017 Train drops categories other than bird, cat, dog, horse, sheep, cow, elephant, bear, zebra and giraffe (23989 images)
- Eval: 2017 Val drops categories other than bird, cat, dog, horse, sheep, cow, elephant, bear, zebra and giraffe (1016 images)

Requirements

Python 3.6
torch 0.4.1
torchvision 0.2.1
tqdm
```
$ pip install tqdm
```
tensorboardX
```
$ pip install tensorboardX
```

Setup

Prepare data

For PASCAL VOC 2007

Download dataset
- Training / Validation (5011 images)
- Test (4952 images)

Extract to data folder, now your folder structure should be like:

easy-faster-rcnn.pytorch
    - data
        - VOCdevkit
            - VOC2007
                - Annotations
                    - 000001.xml
                    - 000002.xml
                    ...
                - ImageSets
                    - Main
                        ...
                        test.txt
                        ...
                        trainval.txt
                        ...
                - JPEGImages
                    - 000001.jpg
                    - 000002.jpg
                    ...
        - ...

For MS COCO 2017

Download dataset
- 2017 Train images [18GB] (118287 images)
  
  COCO 2017 Train = COCO 2015 Train + COCO 2015 Val - COCO 2015 Val Sample 5k
- 2017 Val images [1GB] (5000 images)
  
  COCO 2017 Val = COCO 2015 Val Sample 5k (formerly known as minival)
- 2017 Train/Val annotations [241MB]

Extract to data folder, now your folder structure should be like:

easy-faster-rcnn.pytorch
    - data
        - COCO
            - annotations
                - instances_train2017.json
                - instances_val2017.json
                ...
            - train2017
                - 000000000009.jpg
                - 000000000025.jpg
                ...
            - val2017
                - 000000000139.jpg
                - 000000000285.jpg
                ...
        - ...

Build CUDA modules

Define your CUDA architecture code
```
$ export CUDA_ARCH=sm_61
```
- sm_61 is for GTX 1080 Ti, to see others visit here
- To check your GPU architecture, you might need following script to find out GPU information
```
$ nvidia-smi -L
```

Build Non-Maximum-Suppression module

$ nvcc -arch=$CUDA_ARCH -c --compiler-options -fPIC -o nms/src/nms_cuda.o nms/src/nms_cuda.cu
$ python nms/build.py
$ python -m nms.test.test_nms

Result after unit testing

Build ROI-Align module (modified from RoIAlign.pytorch)

$ nvcc -arch=$CUDA_ARCH -c --compiler-options -fPIC -o roi/align/src/cuda/crop_and_resize_kernel.cu.o roi/align/src/cuda/crop_and_resize_kernel.cu
$ python roi/align/build.py

Install pycocotools for MS COCO 2017 dataset
1. Clone and build COCO API
```
$ git clone https://github.com/cocodataset/cocoapi
$ cd cocoapi/PythonAPI
$ make
```
  It's not necessary to be under project directory
2. If an error with message pycocotools/_mask.c: No such file or directory has occurred, please install cython and try again
```
$ pip install cython
```
3. Copy pycocotools into project
```
$ cp -R pycocotools /path/to/project
```

Usage

Train

To apply default configuration (see also config/)
```
$ python train.py -s=coco2017 -b=resnet101
```

To apply custom configuration (see also train.py)

$ python train.py -s=coco2017 -b=resnet101 --pooling_mode=align

Evaluate

To apply default configuration (see also config/)

$ python eval.py -s=coco2017 -b=resnet101 /path/to/checkpoint.pth

To apply custom configuration (see also eval.py)

$ python eval.py -s=coco2017 -b=resnet101 --pooling_mode=align /path/to/checkpoint.pth

Infer

To apply default configuration (see also config/)

$ python infer.py -c=/path/to/checkpoint.pth -s=coco2017 -b=resnet101 /path/to/input/image.jpg /path/to/output/image.jpg

To apply custom configuration (see also infer.py)

$ python infer.py -c=/path/to/checkpoint.pth -s=coco2017 -b=resnet101 -p=0.9 /path/to/input/image.jpg /path/to/output/image.jpg

Notes

Illustration for feature pyramid (see forward in model.py)

# Bottom-up pathway
c1 = self.conv1(image)
c2 = self.conv2(c1)
c3 = self.conv3(c2)
c4 = self.conv4(c3)
c5 = self.conv5(c4)

# Top-down pathway and lateral connections
p5 = self.lateral_c5(c5)
p4 = self.lateral_c4(c4) + F.interpolate(input=p5, size=(c4.shape[2], c4.shape[3]), mode='nearest')
p3 = self.lateral_c3(c3) + F.interpolate(input=p4, size=(c3.shape[2], c3.shape[3]), mode='nearest')
p2 = self.lateral_c2(c2) + F.interpolate(input=p3, size=(c2.shape[2], c2.shape[3]), mode='nearest')

# Reduce the aliasing effect
p4 = self.dealiasing_p4(p4)
p3 = self.dealiasing_p3(p3)
p2 = self.dealiasing_p2(p2)

p6 = F.max_pool2d(input=p5, kernel_size=2)

Illustration for "find labels for each anchor_bboxes" in region_proposal_network.py
Illustration for NMS CUDA

About

An easy implementation of FPN (https://arxiv.org/pdf/1612.03144.pdf) in PyTorch.

pytorch fpn deep-learning object-detection voc-dataset mscoco-dataset resnet-18 resnet-50 resnet-101

MIT License

Languages

Language:Python 83.8%Language:C 9.0%Language:Cuda 7.2%