DD-DuDa / awesome-vit-quantization-acceleration

List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Vit Quantization and Acceleration Awesome

🔍 Dive into the cutting-edge with this curated list of papers on Vision Transformers (ViT) quantization and hardware acceleration, featured in top-tier AI conferences and journals. This collection is meticulously organized and draws upon insights from our comprehensive survey:

[Arxiv] Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey

Table of Contents

Model Quantization

Activation Quantization Optimization

Date Title Paper Code
2021.11 “PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization” [ECCV‘22] [code]
2021.11 “FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer” [IJCAI’22] [code]
2022.12 “RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers” [ICCV‘23] [code]
2023.03 “Towards Accurate Post-Training Quantization for Vision Transformer” [MM’22] -
2023.05 “TSPTQ-ViT: Two-scaled post-training quantization for vision transformer” [ICASSP‘23] -
2023.11 “I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization” [Arxiv] [code]
2024.01 “MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer” [Arxiv] -
2024.01 “LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation” [Arxiv] -
2024.02 “RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization” [Arxiv] -
2024.04 “Instance-Aware Group Quantization for Vision Transformers” [Arxiv] -
2024.05 “P^2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer” [Arxiv] [code]

Calibration Optimization For PTQ

Date Title Paper Code
2021.06 “Post-Training Quantization for Vision Transformer” [NIPS 2021] [code]
2021.11 “PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization” [ECCV’22] [code]
2022.03 “Patch Similarity Aware Data-Free Quantization for Vision Transformers” [ECCV‘22] [code]
2022.09 “PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers” [TNNLS’23] [code]
2022.11 “NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers” [CVPR‘23] -
2023.03 “Towards Accurate Post-Training Quantization for Vision Transformer” [MM’22] -
2023.05 “Finding Optimal Numerical Format for Sub-8-Bit Post-Training Quantization of Vision Transformers” [ICASSP‘23] -
2023.08 “Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers” [ICCV’23] [code]
2023.10 “LLM-FP4: 4-Bit Floating-Point Quantized Transformers” [EMNLP‘23] [code]
2024.05 “P^2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer” [Arxiv] [code]

Gradient-base Optimization For QAT

Date Title Paper Code
2022.01 “TerViT: An Efficient Ternary Vision Transformer” [Arxiv] -
2022.10 “Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer” [NIPS’22] [code]
2022.12 “Quantformer: Learning Extremely Low-Precision Vision Transformers” [TPAMI‘22] -
2023.02 “Oscillation-free Quantization for Low-bit Vision Transformers” [PMLR’23] [code]
2023.05 “Boost Vision Transformer with GPU-Friendly Sparsity and Quantization” [CVPR‘23] -
2023.06 “Bit-Shrinking: Limiting Instantaneous Sharpness for Improving Post-Training Quantization” [CVPR’23] -
2023.07 “Variation-aware Vision Transformer Quantization” [Arxiv] [code]
2023.12 “PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile” [NIPS‘23] -

Binary Quantization

Date Title Paper Code
2022.11 “BiViT: Extremely Compressed Binary Vision Transformer” [ICCV’23] -
2023.05 “BinaryViT: Towards Efficient and Accurate Binary Vision Transformers” [Arxiv] -
2023.06 “BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models” [CVPR‘23] [code]
2024.05 “BinaryFormer: A Hierarchical-Adaptive Binary Vision Transformer (ViT) for Efficient Computing” [TII] -

Hardware Acceleration

Non-linear Operations Acceleration

Date Title Paper Code
2021.11 “FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer” [IJCAI’22] [code]
2022.07 “I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference” [ICCV‘23] [code]
2023.06 “Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization” [MLSYS’23] -
2023.10 “SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference” [ICCAD‘23] -
2023.12 “PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile” [NIPS’23] -
2024.05 “P^2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer” [Arxiv] [code]

Hardware Accelerator

Date Title Paper Code
2022.01 “VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer” [Arxiv] -
2022.08 “Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization” [FPL‘22] -
2023.10 “An Integer-Only and Group-Vector Systolic Accelerator for Efficiently Mapping Vision Transformer on Edge” [TCAS-I’23] -
2023.10 “SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference” [ICCAD‘23] -
2024.05 “P^2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer” [Arxiv] [code]

Citation

If you find our survey useful or relevant to your research, please kindly cite our paper:

@misc{du2024model,
      title={Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey}, 
      author={Dayou Du and Gu Gong and Xiaowen Chu},
      year={2024},
      eprint={2405.00314},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.

License:MIT License