LeiWang1999

followers

following

stars

Institute of Computing Technology, UCAS

Peking

https://leiblog.wang

Organizations

microsoft

Lei Wang's repositories

ZYNQ-NVDLA

NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.

Language:Verilog283 8 28

tvm_gpu_gemm

play gemm with tvm

Language:Cuda79 4 1

AutoGPTQ.tvm

GPTQ inference TVM kernel

Language:Cuda34 3 2

VehicleFlowDetection

Implement of vehicle flow statistics based on tensorflow and yolo3 with pyqt5 GUI.

Language:Python18 3 3

leiblog.wang

My New Blog Powered by HEXO http://leiblog.wang

Language:HTML5 20

rocblas-benchmark

Language:C++5 20

BitBLAS

Language:PythonMIT4 30

memfusion_artifact

Language:Python400

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonApache-2.04 10

cv

resume.

Language:TeX3 20

mlc-benchmark

Language:Python300

cutlass

Language:C++NOASSERTION2 20

cutlass_fpA_intB_gemm

A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer

Apache-2.0200

Ladder

@DataStructures_Cbased I'm Coming！

Language:PythonMIT2 10

Roller

Build and Train AlexNet with PyTorch and Predict with TVM and Pytorch, compare the performance between them

Language:Python2 20

vllm-bitblas

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02 10

MSBitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonMIT100

nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

Language:C++MIT1 10

vLLM

Language:PythonApache-2.01 10

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonBSD-3-Clause010

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT010

AutoGPTQ_nf

Language:PythonMIT010

gptq_faster

Faster 3bit CUDA Kernel for gptq.

Language:PythonApache-2.0010

LeiWang1999

010

mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Language:PythonApache-2.0010

nmsparse

Language:HTML010

nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Language:PythonMIT010

ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language:PythonApache-2.0010

relax

Language:PythonApache-2.0010

Welder_artifacts

OSDI 2023 WElder artifacts

Language:Python010