Subject_No_i's starred repositories
stable-diffusion-webui
Stable Diffusion web UI
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
FasterTransformer
Transformer related optimization, including BERT, GPT
chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
flashinfer
FlashInfer: Kernel Library for LLM Serving
unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
CompilerGym
Reinforcement learning environments for compiler and program optimization tasks
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
how-to-optimize-gemm
row-major matmul optimization
kokkos-tutorials
Tutorials for the Kokkos C++ Performance Portability Programming Ecosystem
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
dnnweaver2
Open Source Specialized Computing Stack for Accelerating Deep Neural Networks.
dsa-framework
Release of stream-specialization software/hardware stack.
RayTracingToInfinity
A feature packed raytracer built with C++
tvm_gpu_gemm
play gemm with tvm
brainstorm
Compiler for Dynamic Neural Networks