Yu Zhang's starred repositories
DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
Informer2020
The GitHub repository for the paper "Informer" accepted by AAAI 2021.
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Skywork
Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
adaptive-span
Transformer training code for sequential tasks
landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
fairseq-apollo
FairSeq repo with Apollo optimizer
tensor-book
张量计算系列教程 (Tensor Computations Tutorials)
gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
icml17_knn
Deriving Neural Architectures from Sequence and Graph Kernels
GPU-Puzzles
Solve puzzles. Learn CUDA.
token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
acdc-torch
ACDC: A Structured Efficient Linear Layer
Recurrent-Linear-Transformers
Implementation of Recurrent Linear Transformers in Jax+Flax.