Zhenyu He's starred repositories
ChunkLlama
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
ring-attention-pytorch
Implementation of š Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
Transformer-M
[ICLR 2023] One Transformer Can Understand Both 2D & 3D Molecular Data (official implementation)
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.