MicroZHY

followers

following

stars

MicroZHY's repositories

HPC-Lab-Docs

Documentation for HPC course

Language:Makefile100

awesome-model-quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

000

conv2d_direct

Language:Cuda000

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CNOASSERTION000

ECE408

Language:Cuda000

ConvStencil

Language:CudaMIT000

CUDA-Learn-Note

🎉CUDA 笔记 / 高频面试题汇总 / C++笔记，个人笔记，更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

GPL-3.0000

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

MIT000

CUDATutorial

A CUDA tutorial to make people learn CUDA program from 0

000

DASP

Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication" by Yuechen Lu and Weifeng Liu.

AGPL-3.0000

DeepLearningSystem

Deep Learning System core principles introduction.

Apache-2.0000

DTC-SpMM_ASPLOS24

000

FVENS

Finite volume Euler / Navier-Stokes solver

Language:C++GPL-3.0000

how-to-write-makefile

跟我一起写Makefile重制版

Language:Python000

implicit-gemm-tensor-core-convolution

Simple example of how to write an Implicit GEMM Convolution in CUDA using the tensor core WMMA API and bindings for PyTorch.

MIT000

kamacoder-solutions

卡码网题解全集

000

leetcode-master

《代码随想录》LeetCode 刷题攻略：200道经典题目刷题顺序，共60w字的详细图解，视频难点剖析，50余张思维导图，支持C++，Java，Python，Go，JavaScript等多语言版本，从此算法学习不再迷茫！🔥🔥 来看看，你会发现相见恨晚！🚀

000

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

000

MixedPrecisionBlockQR

CUDA implementation of mixed-precision block QR decomposition

Language:CudaMIT000

pbbsbench

New version of pbbs benchmarks

MIT000

randLS

000

SPARTA

SParse AcceleRation on Tensor Architecture

000

superlu

Supernodal sparse direct solver. https://portal.nersc.gov/project/sparse/superlu/

NOASSERTION000

TC-GNN_ATC23

Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.

000

ted-join-hipc22

Language:Cuda000

tensor-cores-numerical-behavior

Test suite for probing the numerical behavior of NVIDIA tensor cores

GPL-2.0000

tensorcore-via-register-lsu

Language:Cuda000

Tetris-artifact-evalution

000

TileSpMSpV

000

wmma_extension

An extension library of WMMA API (Tensor Core API)

Language:CudaMIT000