ShaneShen / MutualNet

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution (ECCV'20 Oral)

Home Page:https://arxiv.org/abs/1909.12978

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution (ECCV'20 Oral) [arXiv]

This work proposes a method to train a network that is executable at dynamic resource constraints (e.g., FLOPs) during runtime. The proposed mutual learning scheme for input resolution and network width significantly improves the accuracy-efficiency tradeoffs over Slimmable Networks on various tasks such as image classification, object detection and instance segmentation. The method is also promising to serve as a plug-and-play strategy to boost a single network. It substantially outperforms the powerful AutoAugment in both efficiency (GPU search hours: 15000 vs. 0) and accuracy (ImageNet: 77.6% vs. 78.6%).

Install

  • PyTorch 1.0.1, torchvision 0.2.2, Numpy, pyyaml 5.1.
  • Follow the PyTorch example to prepare ImageNet dataset.

Run

Training

To train MobileNet v1, run the codes below

python train.py app:apps/mobilenet_v1.yml

Training hyperparameters are in the .yml files. width_mult_list is just used to print training logs for corresponding network widths. During testing, you can assign any desired width between the width lower bound and upper bound. To train other models, just use the corresponding .yml files.

Testing

Modify test_only: False to test_only: True in .yml file to enable testing.

Modify pretrained: /PATH/TO/YOUR/WEIGHTS to assign trained weights.

Modify width_mult_list to test more network widths.

python train.py app:apps/mobilenet_v1.yml

Results and model weights

For those who do not have access to Google Drive: here is the link to all model weights in [BaiduYun]. The extraction code is 4y6m.

Performance over the whole FLOPs specturm

Comparison with US-Net under different backbones on ImageNet.

Model weights: [MobileNet v1], [MobileNet v2] Results compared with US-Net

Scaling up model compared with EfficienNet

The best model scaling on MobileNet v1 compared with EfficientNet

Model Best Model Scaling FLOPs Top-1 Acc
EfficientNet d=1.4, w=1.2, r=1.3 2.3B 75.6%
MutualNet (Model) w=1.6, r=1.3 2.3B 77.1%

Boosting performance of a single network

Top-1 accuracy on Cifar-10 and Cifar-100

WideResNet-28-10 GPU search hours Cifar-10 Cifar-100
Baseline 0 96.1% 81.2%
Cutout 0 96.9% 81.6%
Mixup 0 97.3% 82.5%
AutoAugment 5000 97.4% 82.9%
Fast AutoAugment 3.5 97.3% 82.7%
MutualNet 0 97.3% 83.8%

Compared with state-of-the-art performance boosting methods on ImageNet

ResNet-50 Additional Cost Top-1 Acc
Baseline \ 76.5%
Cutout \ 77.1%
Mixup \ 77.9%
CutMix \ 78.6%
KD Teacher Network 76.5%
SENet SE Block 77.6%
AutoAugment 15000 GPU search hours 77.6%
Fast AutoAugment 450 GPU search hours 77.6%
MutualNet (Model) \ 78.6%

Reference

- The code is based on the implementation of Slimmable Networks.

About

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution (ECCV'20 Oral)

https://arxiv.org/abs/1909.12978

License:MIT License


Languages

Language:Python 100.0%