Yu Zhang (yzhangcs)

yzhangcs

Geek Repo

Company: Soochow University

Location:Shenzhen, Guangdong

Home Page:https://yzhang.site

Twitter:@yzhang_cs

Github PK Tool:Github PK Tool


Organizations
SUDA-LA

Yu Zhang's starred repositories

unsloth

Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:10151Issues:74Issues:383

VMamba

VMamba: Visual State Space Models,code is based on mamba

Language:PythonLicense:MITStargazers:990Issues:16Issues:43

awesome-mixture-of-experts

A collection of AWESOME things about mixture-of-experts

mamba.py

A simple and efficient Mamba implementation in PyTorch and MLX.

Language:PythonLicense:MITStargazers:657Issues:4Issues:21

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models

Language:PythonLicense:NOASSERTIONStargazers:638Issues:20Issues:5

review-2023

二〇二三年的年终总结都写好了吗?

inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

Language:C++License:MITStargazers:227Issues:7Issues:13

einx

Tensor Operations in Einstein-Inspired Notation for Python.

Language:PythonLicense:MITStargazers:211Issues:4Issues:7

zero-bubble-pipeline-parallelism

Zero Bubble Pipeline Parallelism

Language:PythonLicense:NOASSERTIONStargazers:196Issues:5Issues:12

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

License:Apache-2.0Stargazers:175Issues:12Issues:0

lightning-attention

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Language:PythonLicense:MITStargazers:167Issues:11Issues:12

accelerated-scan

Accelerated First Order Parallel Associative Scan

Language:PythonLicense:MITStargazers:108Issues:8Issues:4

moe_attention

Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

Language:PythonLicense:MITStargazers:78Issues:7Issues:2

CLEX

[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models

Language:PythonLicense:MITStargazers:69Issues:4Issues:7

triton-autodiff

Experiment of using Tangent to autodiff triton

Language:PythonLicense:MITStargazers:66Issues:4Issues:0

top_k_attention

The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonathan Berant. SustaiNLP 2021).

Language:PythonStargazers:56Issues:2Issues:0

llm-misinformation-survey

Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misinformation"

Language:PythonLicense:Apache-2.0Stargazers:50Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:39Issues:4Issues:3

why-weight-decay

Why Do We Need Weight Decay in Modern Deep Learning? [arXiv, Oct 2023]

Language:PythonLicense:NOASSERTIONStargazers:35Issues:2Issues:0

Highway-Transformer

[ACL‘20] Highway Transformer: A Gated Transformer.

Language:PythonLicense:Apache-2.0Stargazers:32Issues:3Issues:0

ADM-ES

[ICLR 2024] Official code for the paper 'Elucidating the Exposure Bias in Diffusion Models'

Language:PythonLicense:MITStargazers:28Issues:1Issues:1

tangent

Source-to-Source Debuggable Derivatives in Pure Python

Language:PythonLicense:Apache-2.0Stargazers:14Issues:1Issues:0

lecture2

Obsolete version of CUDA-mode repo -- use cuda-mode/lectures instead

Language:Jupyter NotebookStargazers:13Issues:3Issues:0

Awesome-Simultaneous-Translation

Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.

Stargazers:3Issues:0Issues:0