MicroZHY's repositories
HPC-Lab-Docs
Documentation for HPC course
awesome-model-quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
CUDA-Learn-Note
🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
DASP
Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication" by Yuechen Lu and Weifeng Liu.
DeepLearningSystem
Deep Learning System core principles introduction.
FVENS
Finite volume Euler / Navier-Stokes solver
how-to-write-makefile
跟我一起写Makefile重制版
implicit-gemm-tensor-core-convolution
Simple example of how to write an Implicit GEMM Convolution in CUDA using the tensor core WMMA API and bindings for PyTorch.
leetcode-master
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
mixed-precision-ir
Mixed Precision Iterative Refinement
MixedPrecisionBlockQR
CUDA implementation of mixed-precision block QR decomposition
pbbsbench
New version of pbbs benchmarks
SPARTA
SParse AcceleRation on Tensor Architecture
TC-GNN_ATC23
Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
wmma_extension
An extension library of WMMA API (Tensor Core API)