Yu Zhang's starred repositories
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
matmulfreellm
Implementation for MatMul-free LM.
Phi-3CookBook
This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
DeepSeek-Coder-V2
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
attention-cnn
Source code for "On the Relationship between Self-Attention and Convolutional Layers"
gemma-2B-10M
Gemma 2B with 10M context length using Infini-attention.
StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
Agent-Attention
Official repository of Agent Attention (ECCV2024)
transformer-sequential
Trains Transformer model variants. Data isn't shuffled between batches.
block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)
triton-index
Cataloging released Triton kernels.
uncheatable_eval
Evaluating LLMs with Dynamic Data
hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
GL-DancingMen
Cipher font of "The Adventure of the Dancing Men", The Return of Sherlock Holmes by Arthur Conan Doyle.