YOLOv4-PyTorch

Overview

The inspiration for this project comes from ultralytics/yolov3 && AlexeyAB/darknet Thanks.

This project is a YOLOv4 object detection system. Development framework by PyTorch.

The goal of this implementation is to be simple, highly extensible, and easy to integrate into your own projects. This implementation is a work in progress -- new features are currently being implemented.

About YOLOv4
Installation
Usage
- Train
- Test
- Inference
Train on Custom Dataset
Credit

About YOLOv4

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL.

Installation

Clone and install requirements

git clone https://github.com/Lornatang/YOLOv4-PyTorch.git
cd YOLOv4-PyTorch/
pip install -r requirements.txt

Download pre-trained weights

cd weights/
bash download_weights.sh

Download PascalVoc2007

cd data/
bash get_voc_dataset.sh

Download COCO2014

cd data/
bash get_coco2014_dataset.sh

Download COCO2017

cd data/
bash get_coco2017_dataset.sh

Usage

Train

Example (COCO2017)

To train on COCO2014/COCO2017 run:

python train.py --config-file configs/COCO-Detection/yolov5-small.yaml --data data/coco2017.yaml --weights ""

Example (VOC2007+2012)

To train on VOC07+12 run:

python train.py --config-file configs/PascalVOC-Detection/yolov5-small.yaml --data data/voc2007.yaml --weights ""

Other training methods

Normal Training: python train.py --config-file configs/COCO-Detection/yolov5-small.yaml --data data/coco2014.yaml --weights "" to begin training after downloading COCO data with data/get_coco2014_dataset.sh. Each epoch trains on 117,263 images from the train and validate COCO sets, and tests on 5000 images from the COCO validate set.

Resume Training: python train.py --config-file configs/COCO-Detection/yolov5-small.yaml --data data/coco2014.yaml --resume to resume training from weights/checkpoint.pth.

Test

All numbers were obtained on local machine servers with 2 NVIDIA GeForce RTX 2080 SUPER GPUs & NVLink. The software in use were PyTorch 1.5.1, CUDA 10.2, cuDNN 7.6.5.

Example (COCO2017)

To train on COCO2014/COCO2017 run:

python test.py --config-file configs/COCO-Detection/yolov5-small.yaml --data data/coco2017.yaml --weights weights/COCO-Detection/yolov5-small.pth

Example (VOC2007+2012)

To train on VOC07+12 run:

python test.py --config-file configs/PascalVOC-Detection/yolov5-small.yaml --data data/voc2007.yaml --weights weights/PascalVOC-Detection/yolov5-small.pth

Common Settings for VOC Models

All VOC models were trained on voc2007_trainval + voc2012_trainval and evaluated on voc2007_test.
The default settings are not directly comparable with YOLOv4's standard settings. The default settings are not directly comparable with Detectron's standard settings. For example, our default training data augmentation uses scale jittering in addition to horizontal flipping.
For YOLOv3/YOLOv4, we provide baselines based on 2 different backbone combinations:
- Darknet-53: Use a ResNet+VGG backbone with standard conv and FC heads for mask and box prediction, respectively.
- CSPDarknet-53: Use a ResNet+CSPNet backbone with standard conv and FC heads for mask and box prediction, respectively. It obtains the best speed/accuracy tradeoff, but the other two are still useful for research.

Pascal VOC Object Detection Baselines

Model	train time (s/iter)	inference time (ms/im)	train mem (GB)	AP^test	AP₅₀	fps	params	FLOPs	download
MobileNet-v1	9.6	2.5	4.0	31.2	61.3	400	4.95M	11.3B	model
VGG16	-	-	-	-	-	-	-	-	-
YOLOv3-Tiny	10.5	1.5	3.7	24.3	53.1	667	7.96M	10.5B	model
YOLOv3	2.4	6.7	6.4	57.9	82.6	149	61.79M	155.6B	model
YOLOv3-SPP	2.4	6.7	6.3	59.7	83.3	149	62.84M	156.5B	model
YOLOv4-Tiny	12.3	1.5	2.7	20.0	46.0	667	3.10M	6.5B	model
YOLOv4	2.1	7.5	6.7	61.4	83.7	133	60.52M	131.6B	model
YOLOv5-small	5.4	2.3	1.7	49.3	75.9	435	7.31M	17.0B	model
YOLOv5-medium	3.6	3.8	3.1	56.5	80.3	263	21.56M	51.7B	model
YOLOv5-large	2.8	6.1	5.1	59.4	81.6	164	47.50M	116.4B	model
YOLOv5-xlarge	1.4	10.8	7.2	60.2	82.6	93	88.56M	220.6B	model

Common Settings for COCO Models

All COCO models were trained on train2017 and evaluated on val2017.
The default settings are not directly comparable with YOLOv4's standard settings. The default settings are not directly comparable with Detectron's standard settings. For example, our default training data augmentation uses scale jittering in addition to horizontal flipping.
For YOLOv3/YOLOv4, we provide baselines based on 3 different backbone combinations:
- Darknet-53: Use a ResNet+VGG backbone with standard conv and FC heads for mask and box prediction, respectively.
- CSPDarknet-53: Use a ResNet+CSPNet backbone with standard conv and FC heads for mask and box prediction, respectively. It obtains the best speed/accuracy tradeoff, but the other two are still useful for research.
- GhostDarknet-53 : Use a ResNet+Ghost backbone with standard conv and FC heads for mask and box prediction, respectively.

COCO Object Detection Baselines

Model	train time (s/iter)	inference time (ms/im)	train mem (GB)	AP^test	AP₅₀	fps	params	FLOPs	download
MobileNet-v1	-	-	-	-	-	-	-	-	-
VGG16	-	-	-	-	-	-	-	-	-
YOLOv3-Tiny	-	-	-	-	-	-	-	-	-
YOLOv3	-	-	-	-	-	-	-	-	-
YOLOv3-SPP	-	-	-	-	-	-	-	-	-
YOLOv4	-	-	-	-	-	-	-	-	-
YOLOv4-Tiny	-	-	-	-	-	-	-	-	-
YOLOv5-small	-	-	-	-	-	-	-	-	-
YOLOv5-medium	-	-	-	-	-	-	-	-	-
YOLOv5-large	-	-	-	-	-	-	-	-	-
YOLOv5-xlarge	-	-	-	-	-	-	-	-	-

Inference

detect.py runs inference on any sources:

python detect.py --cfg configs/COCO-Detection/yolov5-small.yaml  --data data/coco2014.yaml --weights weights/COCO-Detection/yolov5-small.pth  --source ...

Image: --source file.jpg
Video: --source file.mp4
Directory: --source dir/
Webcam: --source 0
HTTP stream: --source https://v.qq.com/x/page/x30366izba3.html

Train on Custom Dataset

Run the commands below to create a custom model definition, replacing your-dataset-num-classes with the number of classes in your dataset.

# move to configs dir
cd configs/
create custom model 'yolov3-custom.yaml'. (In fact, it is OK to modify two lines of parameters, see `create_model.sh`)                              
bash create_model.sh your-dataset-num-classes

Data configuration

Add class names to data/custom.yaml. This file should have one row per class name.

Image Folder

Move the images of your dataset to data/custom/images/.

Annotation Folder

Move your annotations to data/custom/labels/. The dataloader expects that the annotation file corresponding to the image data/custom/images/train.jpg has the path data/custom/labels/train.txt. Each row in the annotation file should define one bounding box, using the syntax label_idx x_center y_center width height. The coordinates should be scaled [0, 1], and the label_idx should be zero-indexed and correspond to the row number of the class name in data/custom/classes.names.

Define Train and Validation Sets

In data/custom/train.txt and data/custom/val.txt, add paths to images that will be used as train and validation data respectively.

Training

To train on the custom dataset run:

python train.py --config-file configs/yolov3-custom.yaml --data data/custom.yaml --epochs 100

Credit

YOLOv4: Optimal Speed and Accuracy of Object Detection

Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao

Abstract
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL.

[Paper] [Project Webpage] [Authors' Implementation]

@article{yolov4,
  title={YOLOv4: Optimal Speed and Accuracy of Object Detection},
  author={Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao},
  journal = {arXiv},
  year={2020}
}

hyaihjq / YOLOv4-PyTorch