This is the implementation of "CSPNet: A New Backbone that can Enhance Learning Capability of CNN" using Darknet framwork.
For installing Darknet framework, you can refer to darknet(AlexeyAB).
Model | #Parameter | BFLOPs | Top-1 | Top-5 | cfg | weight |
---|---|---|---|---|---|---|
DarkNet-53 [1] | 41.57M | 18.57 | 77.2 | 93.8 | cfg | weight |
CSPDarkNet-53 | 27.61M (-34%) | 13.07 (-30%) | 77.2 (=) | 93.6 (-0.2) | cfg | weight |
CSPDarkNet-53-Elastic | - | 7.74 (-58%) | 76.1 (-1.1) | 93.3 (-0.5) | cfg | weight |
ResNet-50 [2] | 22.73M | 9.74 | 75.8 | 92.9 | cfg | weight |
CSPResNet-50 | 21.57M (-5%) | 8.97 (-8%) | 76.6 (+0.8) | 93.3 (+0.4) | cfg | weight |
CSPResNet-50-Elastic | - | 9.36 (-4%) | 76.8 (+1.0) | 93.5 (+0.6) | cfg | weight |
ResNeXt-50 [3] | 22.19M | 10.11 | 77.8 | 94.2 | cfg | weight |
CSPResNeXt-50 | 20.50M (-8%) | 7.93 (-22%) | 77.9 (+0.1) | 94.0 (-0.2) | cfg | weight |
CSPResNeXt-50-Elastic | - | 5.45 (-46%) | 77.2 (-0.6) | 93.8 (-0.4) | cfg | weight |
HarDNet-138s [4] | 35.5M | 13.4 | 77.8 | - | - | - |
DenseNet-264-32 [5] | 27.21M | 11.03 | 77.8 | 93.9 | - | - |
ResNet-152 [2] | 60.2M | 22.6 | 77.8 | 93.6 | - | - |
DenseNet-201-Elastic [6] | 19.48M | 8.77 | 77.9 | 94.0 | - | - |
CSPDenseNet-201-Elastic | 20.17M (+4%) | 7.13 (-19%) | 77.9 (=) | 94.0 (=) | - | - |
Res2NetLite-72 [7] | - | 5.19 | 74.7 | 92.1 | cfg | weight |
Model | #Parameter | BFLOPs | Top-1 | Top-5 | cfg | weight |
---|---|---|---|---|---|---|
PeleeNet [8] | 2.79M | 1.017 | 70.7 | 90.0 | - | - |
PeleeNet-swish | 2.79M | 1.017 | 71.5 | 90.7 | - | - |
PeleeNet-swish-SE | 2.81M | 1.017 | 72.1 | 91.0 | - | - |
CSPPeleeNet | 2.83M (+1%) | 0.888 (-13%) | 70.9 (+0.2) | 90.2 (+0.2) | - | - |
CSPPeleeNet-swish | 2.83M (+1%) | 0.888 (-13%) | 71.7 (+0.2) | 90.8 (+0.1) | - | - |
CSPPeleeNet-swish-SE | 2.85M (+1%) | 0.888 (-13%) | 72.4 (+0.3) | 91.0 (=) | - | - |
SparsePeleeNet [9] | 2.39M | 0.904 | 69.6 | 89.3 | - | - |
EfficientNet-B0* [10] | 4.81M | 0.915 | 71.3 | 90.4 | cfg | weight |
EfficientNet-B0 (official) [10] | - | - | 70.0 | 88.9 | - | - |
MobileNet-v2 [11] | 3.47M | 0.858 | 67.0 | 87.7 | cfg | weight |
CSPMobileNet-v2 | 2.51M (-28%) | 0.764 (-11%) | 67.7 (+0.7) | 88.3 (+0.6) | cfg | weight |
Darknet Ref. [12] | 7.31M | 0.96 | 61.1 | 83.0 | cfg | weight |
CSPDenseNet Ref. | 3.48M (-52%) | 0.886 (-8%) | 65.7 (+4.6) | 86.6 (+3.6) | - | - |
CSPPeleeNet Ref. | 4.10M (-44%) | 1.103 (+15%) | 68.9 (+7.8) | 88.7 (+5.7) | - | - |
CSPDenseNetb Ref. | 1.38M (-81%) | 0.631 (-34%) | 64.2 (+3.1) | 85.5 (+2.5) | - | - |
CSPPeleeNetb Ref. | 2.01M (-73%) | 0.897 (-7%) | 67.8 (+6.7) | 88.1 (+5.1) | - | - |
ResNet-10 [2] | 5.24M | 2.273 | 63.5 | 85.0 | cfg | weight |
CSPResNet-10 | 2.73M (-48%) | 1.905 (-16%) | 65.3 (+1.8) | 86.5 (+1.5) | - | - |
MixNet-M-GPU | - | 1.065 | 71.5 | 90.5 | - | - |
※EfficientNet* is implemented by Darknet framework.
※EfficientNet(official) is trained by official code with batch size equals to 256.
※Swish activation function is presented by [13].
※Squeeze-and-excitation (SE) network is presented by [14].
※MixNet-M-GPU is modified from MixNet-M [21]
- Activation function
Model | Activation | Top-1 | Top-5 |
---|---|---|---|
PeleeNet | LReLU | 70.7 | 90.0 |
PeleeNet | Swish | 71.5 (+0.8) | 90.7 (+0.7) |
PeleeNet | Mish | 71.4 (+0.7) | 90.4 (+0.4) |
CSPPeleeNet | LReLU | 70.9 | 90.2 |
CSPPeleeNet | Swish | 71.7 (+0.8) | 90.8 (+0.6) |
CSPPeleeNet | Mish | 71.2 (+0.3) | 90.3 (+0.1) |
CSPResNeXt-50 | LReLU | 77.9 | 94.0 |
CSPResNeXt-50 | Mish | 78.9 (+1.0) | 94.5 (+0.5) |
※Swish activation function is not suitable for ResNeXt-based models, details are shown in Mish paper [22].
- Data augmentation
Model | Augmentation | Top-1 | Top-5 |
---|---|---|---|
CSPResNeXt-50 | Normal | 77.9 | 94.0 |
CSPResNeXt-50 | Mixup | 77.2 | 94.0 |
CSPResNeXt-50 | Cutmix | 78.0 | 94.3 |
CSPResNeXt-50 | Cutmix+Mixup | 77.7 | 94.4 |
CSPResNeXt-50 | Mosaic | 78.1 | 94.5 |
CSPResNeXt-50 | Blur | 77.5 | 94.8 |
※Mixup is presented by [23] and used by [24].
※CutMix is presented by [25].
※Have to check the implementation of mixup and cutmix.
- Other
Model | Method | Top-1 | Top-5 |
---|---|---|---|
CSPResNeXt-50 | Normal | 77.9 | 94.0 |
CSPResNeXt-50 | Smooth | 78.1 | 94.4 |
※Smooth means label smoothing, which is presented by [26].
Model | Size | 1080ti fps | AP | AP50 | AP75 | cfg | weight |
---|---|---|---|---|---|---|---|
CSPResNeXt50-PANet-SPP | 512×512 | 44 | 38.0 | 60.0 | 40.8 | cfg | weight |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | 44 | 39.4 | 59.4 | 42.5 | cfg | weight |
※PANet is presented by [15].
※SPP is presented by [16].
※GIoU is presented by [17].
Model | Size | 9900K fps | AP | AP50 | AP75 | cfg | weight |
---|---|---|---|---|---|---|---|
YOLOv3-tiny [1] | 416×416 | 54 | - | 33.1 | - | cfg | weight |
YOLOv3-tiny-PRN [18] | 416×416 | 71 | - | 33.1 | - | cfg | weight |
SNet49-ThunderNet* [19] | 320×320 | 47 | 19.1 | 33.7 | 19.6 | - | - |
Ours | 320×320 | 102 | 15.3 | 34.2 | 12.0 | - | - |
SNet146-ThunderNet* [19] | 320×320 | 32 | 23.6 | 40.2 | 24.5 | - | - |
Ours | 320×320 | 52 | 19.4 | 40.0 | 17.0 | - | - |
Pelee** [7] | 304×304 | 7 | 22.4 | 38.3 | 22.9 | - | - |
RefineDetLite** [20] | 320×320 | 8 | 26.8 | 46.6 | 27.4 | - | - |
※SNet49-ThunderNet* and SNet146-ThunderNet* are test on Xeon E5-2682v4.
※Pelee** and RefineDetLite** are test on i7-6700.
- NMS threshold
Model | Size | Threshold | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|
CSPResNeXt50-PANet-SPP | 512×512 | 0.45 | 38.0 | 60.0 | 40.8 | 19.7 | 41.4 | 49.9 |
CSPResNeXt50-PANet-SPP | 512×512 | 0.50 | 38.2 | 60.2 | 41.1 | 19.8 | 41.6 | 50.1 |
CSPResNeXt50-PANet-SPP | 512×512 | 0.55 | 38.4 | 60.1 | 41.3 | 20.0 | 41.7 | 50.3 |
CSPResNeXt50-PANet-SPP | 512×512 | 0.60 | 38.5 | 60.0 | 41.7 | 20.1 | 41.9 | 50.4 |
CSPResNeXt50-PANet-SPP | 512×512 | 0.65 | 38.6 | 59.7 | 42.1 | 20.1 | 41.9 | 50.4 |
CSPResNeXt50-PANet-SPP | 512×512 | 0.70 | 38.5 | 59.2 | 42.4 | 20.1 | 41.9 | 50.4 |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | 0.45 | 39.4 | 59.4 | 42.5 | 20.4 | 42.6 | 51.4 |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | 0.50 | 39.7 | 59.5 | 42.7 | 20.5 | 42.5 | 51.7 |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | 0.55 | 39.8 | 59.5 | 43.0 | 20.7 | 43.1 | 51.9 |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | 0.60 | 40.0 | 59.3 | 43.4 | 20.8 | 43.2 | 52.0 |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | 0.65 | 40.1 | 59.0 | 43.8 | 20.9 | 43.4 | 52.1 |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | 0.70 | 40.1 | 58.6 | 44.2 | 20.9 | 43.4 | 52.1 |
CSPResNeXt50-PANet-SPP-GIoU | 512×512 | aware | 40.0 | 59.5 | 43.4 | 20.8 | 43.2 | 52.0 |
- Activation function
Model | Size | Activation | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|
CSPPeleeNet-PRN | 416×416 | Leaky ReLU | 23.1 | 44.5 | 22.0 | 6.6 | 24.4 | 35.3 |
CSPPeleeNet-PRN | 416×416 | Swish | 24.1 | 45.8 | 23.3 | 6.8 | 26.1 | 35.5 |
- Loss function
Model | Size | Loss | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|---|---|
CSPResNeXt50-PANet-SPP | 512×512 | MSE | 38.0 | 60.0 | 40.8 | 19.7 | 41.4 | 49.9 |
CSPResNeXt50-PANet-SPP | 512×512 | GIoU | 39.4 | 59.4 | 42.5 | 20.4 | 42.6 | 51.4 |
CSPResNeXt50-PANet-SPP | 512×512 | DIoU | - | - | - | - | - | - |
CSPResNeXt50-PANet-SPP | 512×512 | CIoU | - | - | - | - | - | - |
※DIoU and CIoU are presented by [27].
[1] YOLOv3: An Incremental Improvement
[2] Deep Residual Learning for Image Recognition (CVPR 2016)
[3] Aggregated Residual Transformations for Deep Neural Networks (CVPR 2017)
[4] HarDNet: A Low Memory Traffic Network (ICCV 2019)
[5] Densely Connected Convolutional Networks (CVPR 2017)
[6] ELASTIC: Improving CNNs with Dynamic Scaling Policies (CVPR 2019)
[7] RefineDetLite: A Lightweight One-stage Object Detection Framework for CPU-only Devices
[8] Pelee: A Real-Time Object Detection System on Mobile Devices (NeurIPS 2018)
[9] Sparsely Aggregated Convolutional Networks (ECCV 2018)
[10] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (ICML 2019)
[11] MobileNetV2: Inverted Residuals and Linear Bottlenecks (CVPR 2018)
[12] https://pjreddie.com/darknet/tiny-darknet/
[13] Searching for Activation Functions
[14] Squeeze-and-Excitation Networks (CVPR 2018)
[15] Path Aggregation Network for Instance Segmentation (CVPR 2018)
[16] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (TPAMI 2015)
[17] Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression (CVPR 2019)
[18] Enriching Variety of Layer-wise Learning Information by Gradient Combination (ICCVW 2019)
[19] ThunderNet: Towards Real-time Generic Object Detection (ICCV 2019)
[20] RefineDetLite: A Lightweight One-stage Object Detection Framework for CPU-only Devices
[21] MixConv: Mixed Depthwise Convolutional Kernels
[22] Mish: A Self Regularized Non-Monotonic Neural Activation Function
[23] mixup: Beyond Empirical Risk Minimization (ICLR 2018)
[24] Bag of Freebies for Training Object Detection Neural Networks
[25] CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (ICCV 2019)
[26] Rethinking the Inception Architecture for Computer Vision (CVPR 2016)
[27] Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)