fffasttime / AnyPackingNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DeepBurning-MixQ

This is part of the DeepBurning project developed for agile neural network accelerator design in Institute of Computing Technology, Chinese Academy of Sciences. It focuses on the software/hardware co-optimization of FPGA-based accelerators for low bit-width mixed-precision neural network models. In terms of hardware, we mainly explore the packing method of various low bit-width convolution operators, so that each primitive DSP in FPGAs can accommodate as many low bit-width operations as possible, thereby improving DSP utilization. In terms of the model, we mainly utilize differential NAS (Network Architecture Search) technique to perform mixed-precision quantization on the given model, while also considering the hardware implementation efficiency of the quantized model, in order to efficiently deploy the target convolutional neural network model onto FPGA under given resource constraints.

This work will appear in ICCAD'23 and please refer to the paper for more details.

Erjing Luo#, Haitong Huang#, Cheng Liu*, Guoyu Li, Bing Yang, Ying Wang, Huawei Li, Xiaowei Li, "DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs", ICCAD, 2023. (# equal contribution)

Status

This project mainly explores automatic HW/SW co-optimization of FPGA-based neural network accelerators for mixed-precision neural network models. Currently, we have the mixed-precision neural network models fully pipelined across the FPGA, so it mainly targets smaller neural network models with limited layers. A hybrid multi-core neural network accelerator that can accommodate generic mixed-precision neural network models will come coon.

Classification Model

Usage

cd cifar/

# 1. Hardware-aware Mixed Precison NAS
python search_train.py --cd 3e-5 --name mix_vggtiny_cifar_cd3e5
# Params:
# --cd  Stands for complexity decay
# --name  Stands for checkpoint .pt and .log filename
# --model  Mixed precision supernet model, default is `VGGtiny_MixQ` 
# Then, the optimal bit width of each layer will converge after dozens of epochs, for example bitw={8,2,2,2,2,2}, bita = {8,3,3,3,6,3}


# 2. Main train
python main_train.py --bitw 822222 --bita 833363 --name vggtiny_cifar_cd3e5
# Trained weights are under weights/tiny_cifar_cd3e5.pt 


# 3. Test model
python test_acc.py
# You can choose tiny_cifar_cd3e5.pt for test if nothing wrong

# 4. HLS code generation: 
# Now can directly export HLS configuration header and weight file form .pt weight file.
# Adjust `simd, pe` parallelization factor of each layer firstly.
vim hls/config_simd_pe.txt
# Export `config.h` and `weights.hpp` to /hls/tiny_cifar_cd3e5/
python export_hls.py


# 5. Model-Level Hardware Simulation
# simulate_hls.py requires /hls/tiny_cifar_cd3e5/model_param.pkl file generated by export_hls.py
python simulate_hls.py
# This output should consist with hardware output or HLS C-Level simluation

DAC-SDC Object Detection Model

The DAC System Design Contest focused on low-power object detection on an embedded FPGA system: https://www.dac.com/Conference/System-Design-Contest.

The target of this contest is optimize performance of the designs in terms of accuracy and power on a Ultra 96 v2 FPGA board. This contest was held 5 times, from 2018 to 2022, and the performance of optimal design in these years increased from 30 fps to thousands of fps.

Base models for anypacking bitwidth search:

  • UltraNet: https://github.com/heheda365/ultra_net by BJUT_runner team, 1st place of 2020 DAC-SDC contest. UltraNet is a VGGNet-like model with much less parameters. UltraNet_iSmart is 2nd place of 2021 DAC-SDC design by UIUC ismart team, which have much better throughput by fixed packing optimize.
  • UltraNet_Bypass: https://github.com/heymesut/SJTU_microe 21' SJTU, 3rd place of 2021 DAC-SDC contest. A variant of UltraNet with bypass connect. Bypass connect increases model accuracy, but makes design of NN acclerator based on pipeline architecture more difficult.
  • SkyNet: https://github.com/jiangwx/SkrSkr SkrSkr by SHTECH, 1st place of 2021 DAC-SDC contest. SkyNet is a MobileNet-like lightweight model.
  • SkyNetk5: SkyNet with 5x5 depthwise convolution kernel. Since dwconv uses much fewer calculations than pwconv, larger kernel brings higher accuracy with slight cost.

Dataset: See https://byuccl.github.io/dac_sdc_2022/info/.

Usage: First cd dacsdc/, then follow next steps.

1) Hardware-aware Mixed Precison NAS for bit width

# For UltraNet with mixed precision:
python search_train.py --cd 1e-5 --name mix_ultranet_cd1e5

# UltraNet with Bypass:
python search_train.py --cd 1e-5 --name mix_ultranet_bypass_cd1e5 --model UltraNetBypass_MixQ

# SkyNet/SkyNetk5
python search_train.py --cd 1e-5 --name mix_skynet_cd1e5 --model [SkyNet_MixQ | SkyNetk5_MixQ]

2) Main Train

For UltraNet:

# UltraNet_BJTU use full 4bit wquantization
python main_train.py --bitw 444444444 --bita 844444444 --name ultranet_BJTU

# UltraNet_iSmart use full 4-8 mixed quantization for weight
python main_train.py --bitw 844444448 --bita 844444444 --name ultranet_iSmart

# Or use searched bitw, bita from search_train.py
python main_train.py --bitw <bitw> --bita <bita> --name ultranet_anypacking

For UltraNet_Bypass/SkyNet/SkyNetk5

python main_train.py --bitw <bitw> --bita <bita> --name <ckptname> --model [UltraNet_Bypass | SkyNet | SkyNetk5]

3) Test model

python test.py [--model [UltraNet_Bypass_FixQ | SkyNet_FixQ | SkyNetk5_FixQ]]

4) HLS export

# For Ultranet or Ultranet_Bypass
python export_hls.py [--model UltraNet_Bypass_FixQ]
# For SkyNet or SkyNetk5
python export_hls.py [--model SkyNetk5_FixQ]

5) Model-Level Hardware Simulation

python simulate_hls.py [--model [UltraNet_Bypass_FixQ | SkyNet_FixQ | SkyNetk5_FixQ]]

Reference

About


Languages

Language:Python 100.0%