Siyuan Li's starred repositories
tensorflow
An Open Source Machine Learning Framework for Everyone
FasterTransformer
Transformer related optimization, including BERT, GPT
tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
tvm_mlir_learn
compiler learning resources collect.
awesome-ocr
A curated list of promising OCR resources
gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
ClipperDocCN
The documention of ClipperLib in Chinese
yolo_quantization
Based of paper "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
ncnn_breakdown
A breakdown of NCNN
llvm-project-fork
Fork of LLVM Project containing a Colossus IPU backend implementation