ModelTC / awesome-lm-system

Summary of system papers/frameworks/codes/tools on training or serving large model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Large Model (LM) System Awesome

This repo collects papers, repos, tools for large model system, including training, inference, serving and compression.

Papers

Training

Year Publisher Title Framework
2023 Training Diffusion Models with Reinforcement Learning
2023 Extracting Training Data from Diffusion Models
2023 ICLR DySR: Adaptive Super-Resolution via Algorithm and System Co-design DeepSpeed
2023 Scaling Vision-Language Models with Sparse Mixture of Experts DeepSpeed
2023 IPDPS MCR-DL: Mix-and-Match Communication Runtime for Deep Learning DeepSpeed
2023 ICS A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training DeepSpeed
2023 OSDI AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving Alpa
2023 MLSys On Optimizing the Communication of Model Parallelism Alpa
2023 Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models ColossalAI
2022 CVPR Perception Prioritized Training of Diffusion Models
2022 Reducing Activation Recomputation in Large Transformer Models Megatron-LM
2022 HiPC 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed DeepSpeed
2022 NeurIPS The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models DeepSpeed
2022 Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam DeepSpeed
2022 ICML DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale DeepSpeed
2022 Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model DeepSpeed
2022 Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers DeepSpeed
2022 DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing DeepSpeed
2022 OSDI Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Alpa
2022 ICPP Tesseract: Parallelize the Tensor Parallelism Efficiently ColossalAI
2022 A Frequency-aware Software Cache for Large Recommendation System Embeddings ColossalAI
2022 TPDS Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management ColossalAI
2021 Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM Megatron-LM
2021 LoRA: Low-Rank Adaptation of Large Language Models
2021 SC ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning DeepSpeed
2021 ICML 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed DeepSpeed
2021 ATC ZeRO-Offload: Democratizing Billion-Scale Model Training. DeepSpeed
2021 PPoPP DAPPLE: a pipelined data parallel approach for training large models
2021 ICML TeraPipe: Token-Level Pipeline Parallelism for Training Large TeraPipe
2021 ICML Memory-Efficient Pipeline-Parallel DNN Training PipeDream
2021 An Efficient 2D Method for Training Super-Large Deep Learning Models ColossalAI
2021 Maximizing Parallelism in Distributed Training for Huge Neural Networks ColossalAI
2021 Sequence Parallelism: Long Sequence Training from System Perspective ColossalAI
2021 Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training ColossalAI
2020 KDD Tutorial DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. DeepSpeed
2020 SC ZeRO: memory optimizations toward training trillion parameter models. DeepSpeed
2020 NeuraIPS Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping DeepSpeed
2020 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron-LM
2020 torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models TorchGpipe
2019 NeuraIPS GPipe: efficient training of giant neural networks using pipeline parallelism TorchGpipe
2019 SOSP PipeDream: Generalized pipeline parallelism for DNN training PipeDream

Compression

Year Publisher Title Framework
2023 Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
2023 CBQ: Cross-Block Quantization for Large Language Models
2023 Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
2023 Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
2023 Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
2023 RPTQ: Reorder-based Post-training Quantization for Large Language Models
2023 SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
2023 LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
2023 QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
2023 LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
2023 AffineQuant: Affine Transformation Quantization for Large Language Models
2023 LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
2023 QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
2023 LLM-Pruner: On the Structural Pruning of Large Language Models
2023 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
2023 SqueezeLLM: Dense-and-Sparse Quantization
2023 A Simple and Effective Pruning Approach for Large Language Models
2023 On Architectural Compression of Text-to-Image Diffusion Models
2023 ICML SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2023 AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
2023 OWQ: Lessons learned from activation outliers for weight quantization in large language models
2023 ICLR GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
2023 ISCA OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization
2023 Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
2023 ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
2023 ICML SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
2023 ICML Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases DeepSpeed
2023 Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
2023 QLoRA: Efficient Finetuning of Quantized LLMs
2022 NeuraIPS ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers DeepSpeed
2022 NeuraIPS Extreme Compression for Pre-trained Transformers Made Simple and Efficient DeepSpeed
2022 NeuraIPS Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
2022 NeuraIPS LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
2022 NeuraIPS ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
2021 EMNLP Understanding and Overcoming the Challenges of Efficient Transformer Quantization

Inference

Year Publisher Title Framework
2023 Fast Inference in Denoising Diffusion Models via MMD Finetuning
2023 EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models EnergonAI
2023 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
2023 FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
2022 ICML DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale DeepSpeed
2022 SC DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale DeepSpeed

Benchmark

Year Publisher Title Framework
Year Pub Title Framework
Year Pub Title1 Framework

Survey

Year Publisher Title Framework
Year Pub Title Framework
Year Pub Title1 Framework

Frameworks

Year Name Training Inference Serving Comments
2023 EnergonAI
2022 Alpa Compilation based mixed parallelism
2021 Megatron-DeepSpeed Add MoE model training, Curriculum Learning, 3D Parallelism from DeepSpeed to Megatron
2021 TeraPipe
2021 ColossalAI
2021 FasterTransformer
2020 DeepSpeed General Support of Transformers and MoE with 3d-parallelism
2019 Megatron-LM
2019 PipeDream
2019 TorchGipe The torchgipe has been merged to PyTorch in 2020.
2019 PipeDream

About

Summary of system papers/frameworks/codes/tools on training or serving large model

License:Apache License 2.0