Q7bao's starred repositories
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
GPTQ-triton
GPTQ inference Triton kernel
x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
abnormal-floats
Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
fucking-algorithm
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
ScreenshotScraper
ScreenshotScraper is used to take screenshots of web pages and to create PDFs from these screenshots. There are two versions of ScreenshotScraper, the OS version, OS_Screenshot, and the web version, Web_Screenshot.