LeiWang1999

Lei Wang's starred repositories

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.023306 195 3645

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION7858 77 489

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonMIT3110 35 334

CTranslate2

Fast inference engine for Transformer models

Language:C++MIT2953 56 646

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.02620 28 262

punica

Serving multiple LoRA finetuned LLM as one

Language:PythonApache-2.0867 14 37

dlpack

common in-memory tensor structure

Language:PythonApache-2.0865 47 67

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0721 13 61

ucasproposal

LaTeX Proposal Template for the University of Chinese Academy of Sciences

Language:TeX564 9 19

Implementing 32 Verilog Mini Projects. 32 bit adder, Array Multiplier, Barrel Shifter, Binary Divider 16 by 8, Booth Multiplication, CRC Coding, Carry Select and Carry Look Ahead Adder, Carry Skip and Carry Save Adder, Complex Multiplier, Dice Game, FIFO, Fixed Point Adder and Subtractor, Fixed Point Multiplier and Divider, Floating Point IEEE 754 Addition Subtraction, Floating Point IEEE 754 Division, Floating Point IEEE 754 Multiplication, Fraction Multiplier, High Radix Multiplier, I2C and SPI Protocols, LFSR and CFSR, Logarithm Implementation, Mealy and Moore State Machine Implementation of Sequence Detector, Modified Booth Algorithm, Pipelined Multiplier, Restoring and Non Restoring Division, Sequential Multiplier, Shift and Add Binary Multiplier, Traffic Light Controller, Universal_Shift_Register, BCD Adder, Dual Address RAM and Dual Address ROM

Language:VerilogNOASSERTION488 8 4

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Language:PythonMIT459 8 12

HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code

Language:C++MIT427 25 232

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonApache-2.0398 13 21

Stable-Diffusion-ONNX-FP16

Example code and documentation on how to get Stable Diffusion running with ONNX FP16 models on DirectML. Can run accelerated on all DirectML supported cards including AMD and Intel.

Language:PythonGPL-3.0270 13 35