MIT HAN Lab's repositories
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
once-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
proxylessnas
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
data-efficient-gans
[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
gan-compression
[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
torchsparse
TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
anycost-gan
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
fastcomposer
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
tiny-training
On-Device Training Under 256KB Memory [NeurIPS'22]
distrifuser
[CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
offsite-tuning
Offsite-Tuning: Transfer Learning without Full Model
flatformer
[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
spatten-llm
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning