w32zhong

followers

following

stars

Organizations

approach0

t-k-cloud

Wei's repositories

llm.c

LLM training in simple, raw C/CUDA

Language:C100

w32zhong.github.io

Language:JavaScriptMIT1 10

BitDelta

Language:Jupyter NotebookApache-2.0010

blackmamba-fork

Language:Python010

causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Language:CudaBSD-3-Clause000

cllm_fork

Language:PythonApache-2.0000

CS-Drafting

Cascade Speculative Drafting

Language:Python000

DeepSeek-Math

Language:PythonMIT000

EAGLE

EAGLE: Lossless Acceleration of LLM Decoding by Feature Extrapolation

Language:PythonApache-2.0000

filebrowser

📂 Web File Browser

Apache-2.0000

laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Language:PythonMIT000

mamba

Language:PythonApache-2.0000

marker

Convert PDF to markdown quickly with high accuracy

GPL-3.0000

matmulfreellm

Implementation for MatMul-free LM.

Apache-2.0000

MCSD

Multi-Candidate Speculative Decoding

Language:PythonMIT000

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.0000

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonMIT000

Ouroboros

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

Language:PythonApache-2.0000

pytorch-that-I-successfully-built

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonNOASSERTION000

search_with_lepton

Building a quick conversation-based search demo with Lepton AI.

Language:TypeScriptApache-2.0000

SeeAct

SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Language:PythonNOASSERTION000

self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**

Language:Jupyter NotebookApache-2.0000

Sequoia

scalable and robust tree-based speculative decoding algorithm

Language:Python000

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Apache-2.0000

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

Apache-2.0000

surya

OCR, layout analysis, reading order, line detection in 90+ languages

GPL-3.0000

tkblog

my blog.

Language:PHP010

vivado-risc-v

Xilinx Vivado block designs for FPGA RISC-V SoC running Debian Linux distro

Language:Tcl000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache-2.0000

webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Language:PythonApache-2.0000