Yu Zhang (yzhangcs)

yzhangcs

Geek Repo

Company: Soochow University

Location:Nara

Home Page:https://yzhang.site

Twitter:@yzhang_cs

Github PK Tool:Github PK Tool


Organizations
SUDA-LA

Yu Zhang's starred repositories

sonnet

TensorFlow-based neural network library

Language:PythonLicense:Apache-2.0Stargazers:9732Issues:422Issues:193

Yi

A series of large language models trained from scratch by developers @01-ai

Language:PythonLicense:Apache-2.0Stargazers:7506Issues:112Issues:287

DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Language:PythonLicense:MITStargazers:6143Issues:68Issues:151

Informer2020

The GitHub repository for the paper "Informer" accepted by AAAI 2021.

Language:PythonLicense:Apache-2.0Stargazers:5154Issues:37Issues:573

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:2148Issues:24Issues:159

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。

Language:PythonLicense:NOASSERTIONStargazers:1178Issues:22Issues:63

adaptive-span

Transformer training code for sequential tasks

Language:PythonLicense:NOASSERTIONStargazers:608Issues:17Issues:21

landmark-attention

Landmark Attention: Random-Access Infinite Context Length for Transformers

Language:PythonLicense:Apache-2.0Stargazers:400Issues:40Issues:15

fairseq-apollo

FairSeq repo with Apollo optimizer

Language:PythonLicense:MITStargazers:106Issues:7Issues:8

tensor-book

张量计算系列教程 (Tensor Computations Tutorials)

License:MITStargazers:85Issues:4Issues:0

gateloop-transformer

Implementation of GateLoop Transformer in Pytorch and Jax

Language:PythonLicense:MITStargazers:83Issues:11Issues:1

LM-Kernel-FT

A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643

Language:PythonLicense:MITStargazers:68Issues:7Issues:1

HGRN

[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling

icml17_knn

Deriving Neural Architectures from Sequence and Graph Kernels

FSQ

Keras implement of Finite Scalar Quantization

Language:PythonLicense:Apache-2.0Stargazers:54Issues:2Issues:0

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookLicense:MITStargazers:49Issues:0Issues:0

token-shift-gpt

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Language:PythonLicense:MITStargazers:48Issues:4Issues:1

mbr

Minimum Bayes Risk Decoding for Hugging Face Transformers

Language:PythonLicense:Apache-2.0Stargazers:48Issues:6Issues:3

acdc-torch

ACDC: A Structured Efficient Linear Layer

Language:CMakeLicense:MITStargazers:42Issues:8Issues:0

skill-it

Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:36Issues:13Issues:1
Language:PythonStargazers:24Issues:1Issues:0

Recurrent-Linear-Transformers

Implementation of Recurrent Linear Transformers in Jax+Flax.

Language:PythonLicense:Apache-2.0Stargazers:13Issues:4Issues:2

PCFG-NAT

Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".

Language:CudaLicense:MITStargazers:10Issues:1Issues:1

RRU

Official TensorFlow implementation of the paper "Gates are not what you need in RNNs"

Language:PythonLicense:MITStargazers:4Issues:10Issues:0