Yu Zhang (yzhangcs)

yzhangcs

Geek Repo

Company: Soochow University

Location:Nara

Home Page:https://yzhang.site

Twitter:@yzhang_cs

Github PK Tool:Github PK Tool


Organizations
SUDA-LA

Yu Zhang's starred repositories

mamba

Mamba SSM architecture

Language:PythonLicense:Apache-2.0Stargazers:11814Issues:97Issues:432

OpenMoE

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

Language:PythonLicense:Apache-2.0Stargazers:765Issues:12Issues:34

mamba-chat

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍

Language:PythonLicense:Apache-2.0Stargazers:756Issues:3Issues:16

EAGLE

Official Implementation of EAGLE-1 and EAGLE-2

Language:PythonLicense:Apache-2.0Stargazers:670Issues:12Issues:90

ssm

Bayesian learning and inference for state space models

Language:Jupyter NotebookLicense:MITStargazers:543Issues:38Issues:107

examples

Fast and flexible reference benchmarks

Language:ShellLicense:Apache-2.0Stargazers:428Issues:15Issues:40
Language:PythonLicense:Apache-2.0Stargazers:305Issues:22Issues:2

datablations

Scaling Data-Constrained Language Models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:302Issues:36Issues:6

aisys-building-blocks

Building blocks for foundation models.

fast-weights

🏃 Implementation of Using Fast Weights to Attend to the Recent Past.

Language:PythonLicense:MITStargazers:265Issues:13Issues:4

causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Language:CudaLicense:BSD-3-ClauseStargazers:236Issues:3Issues:15

nanoRWKV

RWKV in nanoGPT style

Language:PythonLicense:MITStargazers:159Issues:5Issues:0

fast-weight-transformers

Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Language:Jupyter NotebookLicense:MITStargazers:93Issues:5Issues:0

HyperAttention

Triton Implementation of HyperAttention Algorithm

Language:PythonLicense:Apache-2.0Stargazers:45Issues:3Issues:2

Pushdown-Layers

Code for Pushdown Layers from our EMNLP 2023 paper

transformer-mgk

This is the public github for our paper "Transformer with a Mixture of Gaussian Keys"

Language:PythonLicense:CC0-1.0Stargazers:25Issues:2Issues:1

cutlass_quant

Playing with quantization

Language:HTMLLicense:Apache-2.0Stargazers:25Issues:2Issues:0

FineTuningStability

Code and data of the EMNLP 2022 paper "Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping""

Language:PythonLicense:MITStargazers:12Issues:2Issues:0

flash-linear-attention-pytorch

A Python implementation of flash linear attention operators in TransnormerLLM.

Language:CudaLicense:MITStargazers:9Issues:2Issues:3

xmixers

Xmixers: A collection of SOTA efficient token/channel mixers

Language:PythonStargazers:7Issues:1Issues:0

transformer-components

Test various xformers with tightly controlled variables to explore the limits of transformers.

Language:PythonLicense:MITStargazers:5Issues:2Issues:0
Language:PythonStargazers:5Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonLicense:MITStargazers:5Issues:0Issues:0
Language:Jupyter NotebookStargazers:4Issues:1Issues:0

flash-fft-conv

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

Language:C++License:Apache-2.0Stargazers:2Issues:0Issues:0