hustzxd / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models:

In light of the numerous publications that conducts experiments using PLMs (such as BERT, BART) currently, a new subdirectory efficient_plm/ is created to house papers that are applicable to PLMs but have yet to be verified for their effectiveness on LLMs (not implying that they are not suitable on LLM).

Knowledge Distillation

Title & Authors Introduction Links
Star
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji
image Github paper
StarPublish
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step
Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, Yejin Choi
image Github
Paper
StarPublish
Specializing Smaller Language Models towards Multi-Step Reasoning
Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot
image Github
Paper
Star Publish
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao, Peng Zhang, Jie Tang
image Github
Paper
Knowledge Distillation of Large Language Models
Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
image Github
Paper
StarPublish
Distilling Script Knowledge from Large Language Models for Constrained Language Planning
Siyu Yuan, Jiangjie Chen, Ziquan Fu, Xuyang Ge, Soham Shah, Charles Robert Jankowski, Yanghua Xiao, Deqing Yang
image Github
Paper
Publish
SCOTT: Self-Consistent Chain-of-Thought Distillation
Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin, Xiang Ren
image Paper
StarPublish
DISCO: Distilling Counterfactuals with Large Language Models
Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, Kyle Richardson
image Github
Paper
StarPublish
I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation
Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Lianhui Qin, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin Choi
image Github
Paper
Project
Teaching Small Language Models to Reason
Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn.
image Paper
Star Publish
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister
image Github
Paper
Star
Large Language Model Distillation Doesn't Need a Teacher
Ananya Harsh Jha, Dirk Groeneveld, Emma Strubell, Iz Beltagy
image Github paper
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song
image Paper
Star
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
Jaehun Jung, Peter West, Liwei Jiang, Faeze Brahman, Ximing Lu, Jillian Fisher, Taylor Sorensen, Yejin Choi
image Github paper
PaD: Program-aided Distillation Specializes Large Models in Reasoning
Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xingwei Long, Bowen Zhou
image Paper
Star
Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind
Swarnadeep Saha, Peter Hase, and Mohit Bansal
image Github
Paper
RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment
Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian
image Paper
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
Yuhan Ma, Haiqi Jiang, Chenyou Fan
image Paper
Star
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, Hoifung Poon
image Github
Paper
Project
Star
Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty
Inar Timiryasov, Jean-Loup Tastet
image Github
Paper

Network Pruning

Title & Authors Introduction Links
Star Publish Type
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar, Dan Alistarh
image Github paper
Star Type
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma, Gongfan Fang, Xinchao Wang
image Github paper
Star Type
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun, Zhuang Liu, Anna Bair, J. Zico Kolter
image Github
Paper
Star Type
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Zhangyang Wang
image Github
Paper

Quantization

Title & Authors Introduction Links
StarPublish
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh
image Github
Paper
Star
GPTQ-for-LLaMA: 4 bits quantization of LLaMA using GPTQ.
image Github
StarPublish
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han
image Github
Paper
Star
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, Song Han
image Github
Paper
Star
RPTQ: Reorder-based Post-training Quantization for Large Language Models
Zhihang Yuan and Lin Niu and Jiawei Liu and Wenyu Liu and Xinggang Wang and Yuzhang Shang and Guangyu Sun and Qiang Wu and Jiaxiang Wu and Bingzhe Wu

Github
Paper
Star
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer

Github
Paper
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He
image Paper
Star
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer
image Github
Paper
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei , Yunchen Zhang, Yuhang Li, Xiangguo Zhang, Ruihao Gong, Jinyang Guo, Xianglong Liu
image Paper
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu
image Paper
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu, Barlas Oguz, Changsheng Zhao, Ernie Chang, Pierre Stock, Yashar Mehdad, Yangyang Shi, Raghuraman Krishnamoorthi, Vikas Chandra
image Paper
Star
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh
image Github
Paper
Star
OWQ: Lessons learned from activation outliers for weight quantization in large language models
Changhun Lee, Jungyu Jin, Taesu Kim, Hyungjun Kim, Eunhyeok Park
image Github
Paper
Star
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study
Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen
image Github
Paper
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Xiaoxia Wu, Zhewei Yao, Yuxiong He
image Paper
Star
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De SaXQ
image Github
Paper
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie
image Paper

Inference Acceleration

Title & Authors Introduction Links
StarPublish
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu, Jue WANG, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen
image Github
Paper
Inference with Reference: Lossless Acceleration of Large Language Models
Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei
image Github
paper
Star
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Rae Ying Yee Wong, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia
image Github
paper
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, Anshumali Shrivastava
image Paper
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann
image Paper
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference
Luciano Del Corro, Allie Del Giorno, Sahaj Agarwal, Bin Yu, Ahmed Awadallah, Subhabrata Mukherjee
image Paper
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Xuefei Ning, Zinan Lin, Zixuan Zhou, Huazhong Yang, Yu Wang
image Paper
Publish
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector, Chris Re
image Paper

Efficient Structure Design

Title & Authors Introduction Links
StarPublish
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher RĂ©
image Github
Paper
Star
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
image Github
Paper

Text Compression

Title & Authors Introduction Links
LLMZip: Lossless Text Compression using Large Language Models
Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Dileep Kalathil, Jean-Francois Chamberland, Srinivas Shakkottai
image Paper | Unofficial Github
Star
Adapting Language Models to Compress Contexts
Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen
image Github
Paper
In-context Autoencoder for Context Compression in a Large Language Model
Tao Ge, Jing Hu, Xun Wang, Si-Qing Chen, Furu Wei
image Paper
Publish
EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression
Alexander Tsvetkov. Alon Kipnis
image Paper

Low-Rank Decomposition

Title & Authors Introduction Links
Star Publish
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao
image Github
Paper
TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition
Mingxue Xu, Yao Lei Xu, Danilo P. Mandic
image Paper

Survey

Title & Authors Introduction Links
A Survey on Model Compression for Large Language Models
Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang
image Paper

Hardware

Others

About

A curated list for Efficient Large Language Models