Michael Mi (guocuimi)

guocuimi

Geek Repo

Company:Vectorch

Location:Bellevue, WA

Github PK Tool:Github PK Tool

Michael Mi's repositories

minitf

Simplified version of Tensorflow for learning purposes.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:3Issues:2Issues:1

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

Language:C++License:Apache-2.0Stargazers:1Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

flash_attention_inference

Performance of the C++ interface of flash attention, flash attention v2 and self quantized decoding attention in large language model (LLM) inference scenarios.

Language:C++License:MITStargazers:0Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0