yzhangcs

followers

following

stars

Soochow University

Shenzhen, Guangdong

https://yzhang.site

Organizations

SUDA-LA

Yu Zhang's starred repositories

unsloth

Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonApache-2.010151 74 383

VMamba

VMamba: Visual State Space Models，code is based on mamba

Language:Python1659 15 205

math-lm

Language:PythonMIT990 16 43

DeepSeek-MoE

Language:PythonMIT875 13 32

awesome-mixture-of-experts

A collection of AWESOME things about mixture-of-experts

mamba.py

A simple and efficient Mamba implementation in PyTorch and MLX.

Language:PythonMIT657 4 21

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models

Language:PythonNOASSERTION638 20 5

review-2023

二〇二三年的年终总结都写好了吗？

Language:Python632 6 30

inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

Language:C++MIT227 7 13

einx

Tensor Operations in Einstein-Inspired Notation for Python.

Language:PythonMIT211 4 7

zero-bubble-pipeline-parallelism

Zero Bubble Pipeline Parallelism

Language:PythonNOASSERTION196 5 12

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

Apache-2.0175 120

lightning-attention

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Language:PythonMIT167 11 12

awesome-ssm-ml

MIT119 100

accelerated-scan

Accelerated First Order Parallel Associative Scan

Language:PythonMIT108 8 4

moe_attention

Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

Language:PythonMIT78 7 2

CLEX

[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models

Language:PythonMIT69 4 7

triton-autodiff

Experiment of using Tangent to autodiff triton

Language:PythonMIT66 40

top_k_attention

The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonathan Berant. SustaiNLP 2021).

Language:Python56 20

llm-misinformation-survey

Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misinformation"

Collinear-Constrained-Attention

Language:PythonApache-2.05000

seq_icl

Language:Jupyter NotebookApache-2.039 4 3

mamba-triton

Language:Python36 1 1

why-weight-decay

Why Do We Need Weight Decay in Modern Deep Learning? [arXiv, Oct 2023]

Language:PythonNOASSERTION35 20

Highway-Transformer

[ACL‘20] Highway Transformer: A Gated Transformer.

Language:PythonApache-2.032 30

ADM-ES

[ICLR 2024] Official code for the paper 'Elucidating the Exposure Bias in Diffusion Models'

Language:PythonMIT28 1 1

NeurIPS-WANT-submission-efficient-parallelization-layouts

Language:PythonNOASSERTION20 50

tangent

Source-to-Source Debuggable Derivatives in Pure Python

Language:PythonApache-2.014 10

lecture2

Obsolete version of CUDA-mode repo -- use cuda-mode/lectures instead

Language:Jupyter Notebook13 30

Awesome-Simultaneous-Translation

Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.

300