Michael Mi's starred repositories
chatbot-ui
AI chat for every model.
flashinfer
FlashInfer: Kernel Library for LLM Serving
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
run-clang-format
A wrapper script around clang-format, suitable for linting multiple files and to use for continuous integration
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
SA-Segment-Anything
Vision-oriented multimodal AI