Bruce-Lee-LY

Bruce-Lee-LY

User data from Github https://github.com/Bruce-Lee-LY

Company:Tsinghua University

Home Page:https://www.zhihu.com/people/mu-zi-zhi-6-28

GitHub:@Bruce-Lee-LY

Bruce-Lee-LY's repositories

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:382Issues:5Issues:14

cuda_hook

Hooked CUDA-related dynamic libraries by using automated code generation tools.

Language:CLicense:MITStargazers:166Issues:2Issues:12

cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Language:CudaLicense:MITStargazers:60Issues:5Issues:0

decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

Language:C++License:BSD-3-ClauseStargazers:42Issues:2Issues:0

flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Language:C++License:BSD-3-ClauseStargazers:35Issues:1Issues:4

cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

Language:C++License:BSD-3-ClauseStargazers:17Issues:1Issues:0

matrix_multiply

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.

Language:C++License:MITStargazers:15Issues:2Issues:0

cuda_back2back_hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

Language:CudaLicense:MITStargazers:11Issues:2Issues:1

memory_pool

Simple and efficient memory pool is implemented with C++11.

Language:C++License:MITStargazers:8Issues:2Issues:0

thread_pool

Thread pool is implemented to process task queue with C++11.

Language:C++License:MITStargazers:3Issues:3Issues:0

deep_learning

Implemented the training and inference of several common deep learning model algorithms with tensorflow and pytorch.

Language:PythonLicense:MITStargazers:1Issues:2Issues:0

algorithm_design

Use several algorithm design methods to solve several common problems with C++11.

Language:C++License:MITStargazers:0Issues:2Issues:0

crawler

Several fun crawler cases implemented in Python.

Language:PythonLicense:MITStargazers:0Issues:2Issues:0

data_structure

Several commonly used data structures are implemented with C++11.

Language:C++License:MITStargazers:0Issues:2Issues:0

machine_learning

Implement several common machine learning algorithms with sklearn.

Language:PythonLicense:MITStargazers:0Issues:2Issues:0