Lei Wang (LeiWang1999)

LeiWang1999

Geek Repo

Company:Institute of Computing Technology, UCAS

Location:Peking

Home Page:https://leiblog.wang

Twitter:@Lei_Wang_1999

Github PK Tool:Github PK Tool


Organizations
microsoft

Lei Wang's starred repositories

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:23306Issues:195Issues:3645

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:7858Issues:77Issues:489

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonLicense:MITStargazers:3110Issues:35Issues:334

CTranslate2

Fast inference engine for Transformer models

Language:C++License:MITStargazers:2953Issues:56Issues:646

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonLicense:Apache-2.0Stargazers:2620Issues:28Issues:262

punica

Serving multiple LoRA finetuned LLM as one

Language:PythonLicense:Apache-2.0Stargazers:867Issues:14Issues:37

dlpack

common in-memory tensor structure

Language:PythonLicense:Apache-2.0Stargazers:865Issues:47Issues:67

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:721Issues:13Issues:61

ucasproposal

LaTeX Proposal Template for the University of Chinese Academy of Sciences

32-Verilog-Mini-Projects

Implementing 32 Verilog Mini Projects. 32 bit adder, Array Multiplier, Barrel Shifter, Binary Divider 16 by 8, Booth Multiplication, CRC Coding, Carry Select and Carry Look Ahead Adder, Carry Skip and Carry Save Adder, Complex Multiplier, Dice Game, FIFO, Fixed Point Adder and Subtractor, Fixed Point Multiplier and Divider, Floating Point IEEE 754 Addition Subtraction, Floating Point IEEE 754 Division, Floating Point IEEE 754 Multiplication, Fraction Multiplier, High Radix Multiplier, I2C and SPI Protocols, LFSR and CFSR, Logarithm Implementation, Mealy and Moore State Machine Implementation of Sequence Detector, Modified Booth Algorithm, Pipelined Multiplier, Restoring and Non Restoring Division, Sequential Multiplier, Shift and Add Binary Multiplier, Traffic Light Controller, Universal_Shift_Register, BCD Adder, Dual Address RAM and Dual Address ROM

Language:VerilogLicense:NOASSERTIONStargazers:488Issues:8Issues:4

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Language:PythonLicense:MITStargazers:459Issues:8Issues:12

HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code

Language:C++License:MITStargazers:427Issues:25Issues:232

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Language:PythonLicense:Apache-2.0Stargazers:398Issues:13Issues:21

Stable-Diffusion-ONNX-FP16

Example code and documentation on how to get Stable Diffusion running with ONNX FP16 models on DirectML. Can run accelerated on all DirectML supported cards including AMD and Intel.

Language:PythonLicense:GPL-3.0Stargazers:270Issues:13Issues:35

Tensile

Stretching GPU performance for GEMMs and tensor contractions.

Language:PythonLicense:MITStargazers:196Issues:55Issues:91

Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:170Issues:4Issues:4

allo

Allo: A Programming Model for Composable Accelerator Design

Language:PythonLicense:Apache-2.0Stargazers:87Issues:11Issues:28

RTL-Coder

A new LLM solution for RTL code generation, achieving state-of-the-art performance in non-commercial solutions and outperforming GPT-3.5.

CGRA-Mapper

An LLVM pass that can generate CDFG and map the target loops onto a parameterizable CGRA.

Language:C++License:BSD-3-ClauseStargazers:50Issues:3Issues:13

TileFlow

TileFlow is a performance analysis tool based on Timeloop for fusion dataflows

Language:C++License:MITStargazers:49Issues:0Issues:0

tvm.tl

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Language:PythonLicense:Apache-2.0Stargazers:45Issues:0Issues:1

RISC-V-TensorCore

Transactional Verilog design and Verilator Testbench for a RISC-V TensorCore Vector co-processor for reproducible linear algebra

Language:VerilogLicense:MITStargazers:43Issues:5Issues:0

lakeroad

FPGA synthesis tool powered by program synthesis

Language:RacketLicense:MITStargazers:30Issues:6Issues:248

GSOC_TensorCore

TensorCore Vector Processor for Deep Learning - Google Summer of Code Project

Language:VerilogLicense:Apache-2.0Stargazers:20Issues:2Issues:0

allo-pldi24-artifact

Artifact evaluation of PLDI'24 paper "Allo: A Programming Model for Composable Accelerator Design"

Language:VHDLLicense:Apache-2.0Stargazers:11Issues:0Issues:0

PIM-Toolchain

EDA toolchain for processing-in-memory architectures, including an architecture synthesizer, a compiler, and a simulator

Stargazers:5Issues:0Issues:0
Language:VerilogStargazers:3Issues:0Issues:0