MobileNetV3

An implementation of MobileNetV3 with pyTorch

Theory

You can find the paper of MobileNetV3 at Searching for MobileNetV3.

Prepare data

CIFAR-10
CIFAR-100
SVHN
Tiny-ImageNet
ImageNet: Please move validation images to labeled subfolders, you can use the script here.

Train

Train from scratch:

CUDA_VISIBLE_DEVICES=3 python train.py --batch-size=128 --mode=small \
--print-freq=100 --dataset=CIFAR100 --ema-decay=0 --label-smoothing=0.1 \
--lr=0.3 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 \
--warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200 --width-multiplier=1 \
-nbd -zero-gamma -mixup

where the meaning of the parameters are as followed:

batch-size
mode: using MobileNetV3-Small(if set to small) or MobileNetV3-Large(if set to large).
dataset: which dataset to use(CIFAR10, CIFAR100, SVHN, TinyImageNet or ImageNet).
ema-decay: decay of EMA, if set to 0, do not use EMA.
label-smoothing: $epsilon$ using in label smoothing, if set to 0, do not use label smoothing.
lr-decay: learning rate decay schedule, step or cos.
lr-min: min lr in cos lr decay.
warmup-epochs: warmup epochs using in cos lr deacy.
num-epochs: total training epochs.
nbd: no bias decay.
zero-gamma: zero $gamma$ of last BN in each block.
mixup: using Mixup.

Pretrained models

We have provided the pretrained MobileNetV3-Small model in pretrained.

Experiments

Training setting

on ImageNet

CUDA_VISIBLE_DEVICES=5 python train.py --batch-size=128 --mode=small --print-freq=2000 --dataset=imagenet \
--ema-decay=0.99 --label-smoothing=0.1 --lr=0.1 --save-epoch-freq=50 --lr-decay=cos --lr-min=0 --warmup-epochs=5 \
--weight-decay=1e-5 --num-epochs=250 --num-workers=2 --width-multiplier=1 -dali -nbd -mixup -zero-gamma -save

on CIFAR-10

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR10\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1

on CIFAR-100

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR100\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1

Using more tricks：

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR100\
  --ema-decay=0.999 --label-smoothing=0.1 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1\
  -zero-gamma -nbd -mixup

on SVHN

CUDA_VISIBLE_DEVICES=3 python train.py --batch-size=128 --mode=small --print-freq=1000 --dataset=SVHN\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=20 --num-workers=2 --width-multiplier=1

on Tiny-ImageNet

CUDA_VISIBLE_DEVICES=7 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=tinyimagenet\
  --data-dir=/media/data2/chenjiarong/ImageData/tiny-imagenet --ema-decay=0 --label-smoothing=0 --lr=0.15\
  --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200\
  --num-workers=2 --width-multiplier=1 -dali

Using more tricks：

CUDA_VISIBLE_DEVICES=7 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=tinyimagenet\
  --data-dir=/media/data2/chenjiarong/ImageData/tiny-imagenet --ema-decay=0.999 --label-smoothing=0.1 --lr=0.15\
  --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200\
  --num-workers=2 --width-multiplier=1 -dali -nbd -mixup

MobileNetV3-Large

on ImageNet

	Madds	Parameters	Top1-acc	Top5-acc
Offical 1.0	219 M	5.4 M	75.2%	-
Ours 1.0	216.6 M	5.47 M	-	-

on CIFAR-10

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	66.47 M	4.21 M	-	-

on CIFAR-100

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	66.58 M	4.32 M	-	-

MobileNetV3-Small

on ImageNet

	Madds	Parameters	Top1-acc	Top5-acc
Offical 1.0	56.5 M	2.53 M	67.4%	-
Ours 1.0	56.51 M	2.53 M	67.52%	87.58%

The pretrained model with top-1 accuracy 67.52% is provided in the folder pretrained.

on CIFAR-10 (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	17.51 M	1.52 M	92.97%	-

on CIFAR-100 (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	17.60 M	1.61 M	73.69%	92.31%
More Tricks	same	same	76.24%	92.58%

on SVHN (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	17.51 M	1.52 M	97.92%	-

on Tiny-ImageNet (Average accuracy of 5 runs)

	Madds	Parameters	Top1-acc	Top5-acc
Ours 1.0	51.63 M	1.71 M	59.32%	81.38%
More Tricks	same	same	62.62%	84.04%

Dependency

This project uses Python 3.7 and PyTorch 1.1.0. The FLOPs and Parameters and measured using torchsummaryX.

ShowLo / MobileNetV3

MobileNetV3

Theory

Prepare data

Train

Pretrained models

Experiments

Training setting

on ImageNet

on CIFAR-10

on CIFAR-100

on SVHN

on Tiny-ImageNet

MobileNetV3-Large

on ImageNet

on CIFAR-10

on CIFAR-100

MobileNetV3-Small

on ImageNet

on CIFAR-10 (Average accuracy of 5 runs)

on CIFAR-100 (Average accuracy of 5 runs)

on SVHN (Average accuracy of 5 runs)

on Tiny-ImageNet (Average accuracy of 5 runs)

Dependency

About

Languages