yzhangcs

Yu Zhang's starred repositories

datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Language:PythonApache-2.019021 277 2885

Bend

A massively parallel, high-level programming language

Language:RustApache-2.017230 93 246

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookMIT13117 93 16

matmulfreellm

Implementation for MatMul-free LM.

Language:PythonApache-2.02879 43 29

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.

Language:Jupyter NotebookMIT2258 16 62

DeepSeek-Coder-V2

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

MIT1967 22 49

xlstm

Official repository of the xLSTM.

Language:PythonAGPL-3.01243 13 43

streaming

A Data Streaming Library for Efficient Neural Network Training

Language:PythonApache-2.01079 21 166

attention-cnn

Source code for "On the Relationship between Self-Attention and Convolutional Layers"

Language:PythonApache-2.01077 27 10

gemma-2B-10M

Gemma 2B with 10M context length using Infini-attention.

Language:Python938 11 10

StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Language:PythonMIT880 12 13

MAP-NEO

Language:Python838 10 34

Samba

Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"

Language:PythonMIT780 24 16

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Language:PythonMIT537 28 35

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonMIT508 11 63

Agent-Attention

Official repository of Agent Attention (ECCV2024)

Language:Python482 4 41

cosmopedia

Language:PythonApache-2.0422 11 11

InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Language:PythonMIT274 16 46

transformer-sequential

Trains Transformer model variants. Data isn't shuffled between batches.

Language:PythonNOASSERTION140 11 3

seqax

seqax = sequence modeling + JAX

Language:PythonBSD-3-Clause130 7 2

LCKV

Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance. Accepted to ACL 2024.

Language:Python126 2 2