Michael Mi's repositories
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++NOASSERTION000
flash_attention_inference
Performance of the C++ interface of flash attention, flash attention v2 and self quantized decoding attention in large language model (LLM) inference scenarios.
Language:C++MIT000
flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:CudaApache-2.0000
Language:Python000
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
Language:C++Apache-2.0000