Yu Zhang's starred repositories
Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
mamba-chat
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
datablations
Scaling Data-Constrained Language Models
aisys-building-blocks
Building blocks for foundation models.
fast-weights
🏃 Implementation of Using Fast Weights to Attend to the Recent Past.
causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
fast-weight-transformers
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.
HyperAttention
Triton Implementation of HyperAttention Algorithm
Pushdown-Layers
Code for Pushdown Layers from our EMNLP 2023 paper
transformer-mgk
This is the public github for our paper "Transformer with a Mixture of Gaussian Keys"
cutlass_quant
Playing with quantization
FineTuningStability
Code and data of the EMNLP 2022 paper "Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping""
flash-linear-attention-pytorch
A Python implementation of flash linear attention operators in TransnormerLLM.
transformer-components
Test various xformers with tightly controlled variables to explore the limits of transformers.
flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores