Wei (w32zhong)

w32zhong

User data from Github https://github.com/w32zhong

GitHub:@w32zhong


Organizations
approach0
t-k-cloud

Wei's repositories

Language:PythonStargazers:1Issues:1Issues:0

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:1Issues:0Issues:0

mnn

Manual Neural Networks (MNN) using CuPy (for learning purpose)

Language:PythonStargazers:1Issues:1Issues:0
Language:JavaScriptLicense:MITStargazers:1Issues:1Issues:0

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:CudaLicense:MITStargazers:0Issues:0Issues:0

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Book-Mathematical-Foundation-of-Reinforcement-Learning

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

clover

Official Implementation of Clover-1 and Clover-2

License:Apache-2.0Stargazers:0Issues:0Issues:0

cs-self-learning

计算机自学指南

License:MITStargazers:0Issues:0Issues:0

EAGLE

EAGLE: Lossless Acceleration of LLM Decoding by Feature Extrapolation

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Language:PythonStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

grub2-bios-uefi-usb

Create a USB boot drive with support for legacy BIOS and 32/64bit UEFI in a single partition on Linux

Stargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:1Issues:0

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

matmulfreellm

Implementation for MatMul-free LM.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

MCSD

Multi-Candidate Speculative Decoding

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Ouroboros

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Sequoia

scalable and robust tree-based speculative decoding algorithm

Language:PythonStargazers:0Issues:0Issues:0

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

License:Apache-2.0Stargazers:0Issues:0Issues:0

surya

OCR, layout analysis, reading order, line detection in 90+ languages

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

License:MITStargazers:0Issues:0Issues:0

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Stargazers:0Issues:0Issues:0

tinyllama-bitnet

Train your own small bitnet model

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

tkblog

my blog.

Language:PHPStargazers:0Issues:1Issues:0

tvm_mlir_learn

compiler learning resources collect.

Language:PythonStargazers:0Issues:0Issues:0