Yuqing's repositories
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
alphafold
Open source code for AlphaFold.
antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12 and GraphCore platforms.
chatgpt-api
Node.js client for the official ChatGPT API. 🔥
cutlass
CUDA Templates for Linear Algebra Subroutines
cuvs
cuVS - a library for vector search and clustering on the GPU
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
faiss
A library for efficient similarity search and clustering of dense vectors.
finetune-transformer-lm
Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
flash-attention
Fast and memory-efficient exact attention
incubator-tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
pytorch-lightning-transformers
Fine-tune transformers with pytorch-lightning
TASO
The Tensor Algebra SuperOptimizer for Deep Learning
ThunderKittens
Tile primitives for speedy kernels
triton
Development repository for the Triton language and compiler
tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities