Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks [CVPR 2023]

This is the official Pytorch/PytorchLightning implementation of the paper:

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

We propose a simple yet fast and effective partial convolution (PConv), as well as a latency-efficient family of architectures called FasterNet.

Image Classification

1. Dependency Setup

Create an new conda virtual environment

conda create -n fasternet python=3.9.12 -y
conda activate fasternet

Clone this repo and install required packages:

git clone https://github.com/JierunChen/FasterNet
pip install -r requirements.txt

2. Dataset Preparation

Download the ImageNet-1K classification dataset and structure the data as follows:

/path/to/imagenet-1k/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class2/
      img4.jpeg

3. Pre-trained Models

name	resolution	acc	#params	FLOPs	model
FasterNet-T0	224x224	71.9	3.9M	0.34G	model
FasterNet-T1	224x224	76.2	7.6M	0.85G	model
FasterNet-T2	224x224	78.9	15.0M	1.90G	model
FasterNet-S	224x224	81.3	31.1M	4.55G	model
FasterNet-M	224x224	83.0	53.5M	8.72G	model
FasterNet-L	224x224	83.5	93.4M	15.49G	model

4. Evaluation

We give an example evaluation command for a ImageNet-1K pre-trained FasterNet-T0 on a single GPU:

python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 125

For evaluating other model variants, change -c, --checkpoint_path accordingly. You can get the pre-trained models from the tables above.
For multi-GPU evaluation, change -g to a larger number or a list, e.g., 8 or 0,1,2,3,4,5,6,7. Note that the batch size for evaluation should be changed accordingly, e.g., change -e from 125 to 1000.

To measure the latency on CPU/ARM and throughput on GPU (if any), run

python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 32  --measure_latency --fuse_conv_bn

-e controls the batch size of input on GPU while the batch size of input is fixed internally to 1 on CPU/ARM.

Note: There are two issues related to latency/throughput measurement in the paper v1. Although they do not affect the conclusion that PConv and FasterNet achieve higher accuracy-latency efficiency, we clarify that:

PConv and FasterNet use "slicing" type for faster inference and latency/throughput measurement. However, it implicitly modifies the shortcut, making a computation inconsistency to using "split_cat". To fix that, we may
- clone the input via x = x.clone() before applying partial convolution, but it introduces additional latency and can defeat the benefits of using "slicing" over "split_cat".
- move the shortcut after the PConv operator, which resolves the issue and is likely to maintain the effectiveness. Models modified are under retraining and will be released once finished.
Latency and throughput are measured after merging the BatchNorm into Conv for all models if applicable. Due to an implementation bug in the initial version, the bias term after merging is wrongly omitted. After fixing the issue, most of the models, including other works compared, will be a bit slower than the statistics reported in the paper v1. We will update the statistics soon.

5. Training

FasterNet-T0 training on ImageNet-1K with a 8-GPU node:

python train_test.py -g 0,1,2,3,4,5,6,7 --num_nodes 1 -n 4 -b 4096 -e 2000 \
--data_dir ../../data/imagenet --pin_memory --wandb_project_name fasternet \
--model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_t0.yaml

To train other FasterNet variants, --cfg need to be changed. You may also want to change the training batch size -b.

Acknowledgement

This repository is built using the timm , poolformer, ConvNeXt and mmdetection repositories.

Citation

If you find this repository helpful, please consider citing:

@article{chen2023run,
  title={Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks},
  author={Chen, Jierun and Kao, Shiu-hong and He, Hao and Zhuo, Weipeng and Wen, Song and Lee, Chul-Ho and Chan, S-H Gary},
  journal={arXiv preprint arXiv:2303.03667},
  year={2023}
}

kio2019 / FasterNet