Sin's starred repositories
llm-applications
A comprehensive guide to building RAG-based LLM applications for production.
ThunderKittens
Tile primitives for speedy kernels
CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Awesome-LLM-RAG-Application
the resources about the application based on LLM with RAG pattern
clang-tutor
A collection of out-of-tree Clang plugins for teaching and learning
Learn-LLVM-12
Learn LLVM 12, published by Packt
nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
RAG_langchain
一个基于langchain实现RAG的简单示例
gpu-benches
collection of benchmarks to measure basic GPU capabilities
CppProjectTemplate
C++ project template with unit-tests, documentation, ci-testing and workflows.
llvm-tutorial
llvm-tutorial文档,翻译以及代码仓库
wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
wmma_extension
An extension library of WMMA API (Tensor Core API)
online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper