Ningxin Zheng's repositories
CUDALibrarySamples
CUDA Library Samples
cutlass
CUDA Templates for Linear Algebra Subroutines
FasterTransformer
Transformer related optimization, including BERT, GPT
flux
A fast communication-overlapping library for tensor parallelism on GPUs.
LeViT
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference
linux
Linux kernel source tree
MLPruning
MLPruning, PyTorch, NLP, BERT, Structured Pruning
nn_pruning
Prune a model while finetuning or training.
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
sputnik
A library of GPU kernels for sparse matrix operations.
transformers
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators