Wei's repositories
causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
CS-Drafting
Cascade Speculative Drafting
EAGLE
EAGLE: Lossless Acceleration of LLM Decoding by Feature Extrapolation
filebrowser
📂 Web File Browser
laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
marker
Convert PDF to markdown quickly with high accuracy
MCSD
Multi-Candidate Speculative Decoding
Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Progressive-Hint
This is the official implementation of "Progressive-Hint Prompting Improves Reasoning in Large Language Models"
pytorch-that-I-successfully-built
Tensors and Dynamic neural networks in Python with strong GPU acceleration
rerope
Rectified Rotary Position Embeddings
search_with_lepton
Building a quick conversation-based search demo with Lepton AI.
SeeAct
SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Sequoia
scalable and robust tree-based speculative decoding algorithm
toolformer-pytorch
Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI
trl
Train transformer language models with reinforcement learning.
vivado-risc-v
Xilinx Vivado block designs for FPGA RISC-V SoC running Debian Linux distro
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
webarena
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"