MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution (ECCV'20 Oral) [arXiv]

This work proposes a method to train a network that is executable at dynamic resource constraints (e.g., FLOPs) during runtime. The proposed mutual learning scheme for input resolution and network width significantly improves the accuracy-efficiency tradeoffs over Slimmable Networks on various tasks such as image classification, object detection and instance segmentation. The method is also promising to serve as a plug-and-play strategy to boost a single network. It substantially outperforms the powerful AutoAugment in both efficiency (GPU search hours: 15000 vs. 0) and accuracy (ImageNet: 77.6% vs. 78.6%).

Install

PyTorch 1.0.1, torchvision 0.2.2, Numpy, pyyaml 5.1.
Follow the PyTorch example to prepare ImageNet dataset.

Run

Training

To train MobileNet v1, run the codes below

python train.py app:apps/mobilenet_v1.yml

Training hyperparameters are in the .yml files. width_mult_list is just used to print training logs for corresponding network widths. During testing, you can assign any desired width between the width lower bound and upper bound. To train other models, just use the corresponding .yml files.

Testing

Modify test_only: False to test_only: True in .yml file to enable testing.

Modify pretrained: /PATH/TO/YOUR/WEIGHTS to assign trained weights.

Modify width_mult_list to test more network widths.

python train.py app:apps/mobilenet_v1.yml

Results and model weights

For those who do not have access to Google Drive: here is the link to all model weights in [BaiduYun]. The extraction code is 4y6m.

Performance over the whole FLOPs specturm

Comparison with US-Net under different backbones on ImageNet.

Model weights: [MobileNet v1], [MobileNet v2]

Scaling up model compared with EfficienNet

The best model scaling on MobileNet v1 compared with EfficientNet

Model	Best Model Scaling	FLOPs	Top-1 Acc
EfficientNet	d=1.4, w=1.2, r=1.3	2.3B	75.6%
MutualNet (Model)	w=1.6, r=1.3	2.3B	77.1%

Boosting performance of a single network

Top-1 accuracy on Cifar-10 and Cifar-100

WideResNet-28-10	GPU search hours	Cifar-10	Cifar-100
Baseline	0	96.1%	81.2%
Cutout	0	96.9%	81.6%
Mixup	0	97.3%	82.5%
AutoAugment	5000	97.4%	82.9%
Fast AutoAugment	3.5	97.3%	82.7%
MutualNet	0	97.3%	83.8%

Compared with state-of-the-art performance boosting methods on ImageNet

ResNet-50	Additional Cost	Top-1 Acc
Baseline	\	76.5%
Cutout	\	77.1%
Mixup	\	77.9%
CutMix	\	78.6%
KD	Teacher Network	76.5%
SENet	SE Block	77.6%
AutoAugment	15000 GPU search hours	77.6%
Fast AutoAugment	450 GPU search hours	77.6%
MutualNet (Model)	\	78.6%

Reference

- The code is based on the implementation of Slimmable Networks.

About

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution (ECCV'20 Oral)

https://arxiv.org/abs/1909.12978

MIT License

Languages

Language:Python 100.0%