yao-jiashu's repositories
KernelCodeGen
GEMM/FMHA CUDA/HIP kernel code generation using MLIR.
CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
flash-attention
Fast and memory-efficient exact attention
gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
HierarchicalKV
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
My-Leetcode-Solution
leetcode solutions as well as necessary data structure and algorithm
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
nvbench
CUDA Kernel Benchmarking Library
pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
PPoPP2017_artifact
Third party assembler and GEMM library for NVIDIA Kepler GPU
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.