Yufeng Li's repositories
bitsandbytes
8-bit CUDA functions for PyTorch
cutlass
CUDA Templates for Linear Algebra Subroutines
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
onnxruntime
ONNX Runtime: cross-platform, high performance scoring engine for ML models
llama
Inference code for LLaMA models
neural-speed
An innovation library for efficient LLM inference via low-bit quantization and sparsity
optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
triton
Development repository for the Triton language and compiler
tutorials
Tutorials for creating and using ONNX models
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Windows-Machine-Learning
Samples for Windows ML.