MicroZHY's repositories

CPP

Lecture notes, projects and other materials for Course 'CS205 C/C++ Program Design' at Southern University of Science and Technology.

Language:C++Stargazers:1Issues:0Issues:0

HPC-Lab-Docs

Documentation for HPC course

Language:MakefileStargazers:1Issues:0Issues:0

awesome-model-quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Stargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0
Language:CudaLicense:MITStargazers:0Issues:0Issues:0

CUDA-Learn-Note

🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

License:GPL-3.0Stargazers:0Issues:0Issues:0

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

License:MITStargazers:0Issues:0Issues:0

CUDATutorial

A CUDA tutorial to make people learn CUDA program from 0

Stargazers:0Issues:0Issues:0

DASP

Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication" by Yuechen Lu and Weifeng Liu.

License:AGPL-3.0Stargazers:0Issues:0Issues:0

DeepLearningSystem

Deep Learning System core principles introduction.

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

FVENS

Finite volume Euler / Navier-Stokes solver

Language:C++License:GPL-3.0Stargazers:0Issues:0Issues:0

how-to-write-makefile

跟我一起写Makefile重制版

Language:PythonStargazers:0Issues:0Issues:0

implicit-gemm-tensor-core-convolution

Simple example of how to write an Implicit GEMM Convolution in CUDA using the tensor core WMMA API and bindings for PyTorch.

License:MITStargazers:0Issues:0Issues:0

leetcode-master

《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀

Stargazers:0Issues:0Issues:0

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Stargazers:0Issues:0Issues:0

mixed-precision-ir

Mixed Precision Iterative Refinement

License:MITStargazers:0Issues:0Issues:0

MixedPrecisionBlockQR

CUDA implementation of mixed-precision block QR decomposition

Language:CudaLicense:MITStargazers:0Issues:0Issues:0

pbbsbench

New version of pbbs benchmarks

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

SPARTA

SParse AcceleRation on Tensor Architecture

Stargazers:0Issues:0Issues:0

TC-GNN_ATC23

Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.

Stargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0

tensor-cores-numerical-behavior

Test suite for probing the numerical behavior of NVIDIA tensor cores

License:GPL-2.0Stargazers:0Issues:0Issues:0
Language:CudaStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

wmma_extension

An extension library of WMMA API (Tensor Core API)

Language:CudaLicense:MITStargazers:0Issues:0Issues:0