yzhangcs

followers

following

stars

Soochow University

Nara

https://yzhang.site

Organizations

SUDA-LA

Yu Zhang's starred repositories

sonnet

TensorFlow-based neural network library

Language:PythonApache-2.09732 422 193

Yi

A series of large language models trained from scratch by developers @01-ai

Language:PythonApache-2.07506 112 287

DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Language:PythonMIT6143 68 151

Informer2020

The GitHub repository for the paper "Informer" accepted by AAAI 2021.

Language:PythonApache-2.05154 37 573

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT2148 24 159

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.02028 79 4

Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数，训练数据，评估数据，评估方法。

Language:PythonNOASSERTION1178 22 63

adaptive-span

Transformer training code for sequential tasks

Language:PythonNOASSERTION608 17 21

landmark-attention

Landmark Attention: Random-Access Infinite Context Length for Transformers

Language:PythonApache-2.0400 40 15

transformer_vq

Language:Python159 2 2

fairseq-apollo

FairSeq repo with Apollo optimizer

Language:PythonMIT106 7 8

tensor-book

张量计算系列教程 (Tensor Computations Tutorials)

MIT85 40

gateloop-transformer

Implementation of GateLoop Transformer in Pytorch and Jax

Language:PythonMIT83 11 1

LM-Kernel-FT

A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643

Language:PythonMIT68 7 1

infinite-former

Language:Python65 4 4

HGRN

[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Language:Python60 2 2

icml17_knn

Deriving Neural Architectures from Sequence and Graph Kernels

Language:Python60 8 2

FSQ

Keras implement of Finite Scalar Quantization

Language:PythonApache-2.054 20

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookMIT4900

token-shift-gpt

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Language:PythonMIT48 4 1

mbr

Minimum Bayes Risk Decoding for Hugging Face Transformers

Language:PythonApache-2.048 6 3

acdc-torch

ACDC: A Structured Efficient Linear Layer

Language:CMakeMIT42 80

skill-it

Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models

Language:Jupyter NotebookApache-2.036 13 1

hgru-pytorch

Language:Python24 10

dpe

Language:Python21 2 3

Recurrent-Linear-Transformers

Implementation of Recurrent Linear Transformers in Jax+Flax.

Language:PythonApache-2.013 4 2

PCFG-NAT

Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".

Language:CudaMIT10 1 1

OnLearningTheKernel

Language:Python7 10

RRU

Official TensorFlow implementation of the paper "Gates are not what you need in RNNs"

Language:PythonMIT4 100

translation-hypothesis-ensembling

Language:Shell3 30