jundaf's repositories
CUDA-INT8-GEMM
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
dnn-test-framework
DNN unit test framework
GPU-Tensor-Permute
permute sequence data on GPU with high bandwidth
adaptive-filtering-algorithms
Adaptive Algorithms
cutlass-b2bgemm
an extension to the cutlass half-precision b2b gemm example
cutlass-kernel-volta-gemm
volta fp16 gemm kernel
FelixFu520-README
A pupil in the computer world.(Felix Fu)
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
EasyChatGPT-API
用python和flask简单实现调用chatGPT的API
EasyWeChatBot
1分钟用ChatGPT API实现微信聊天机器人
flash-attention
Fast and memory-efficient exact attention
gpu-gym
a toy used for keeping all gpus on a machine busy using nccl
GPU-Philox
cuda philox in a single kernel (easily used in fusion)
Heterogeneous-GPUs
Heterogeneous Nvidia (CUDA) and Intel (OpenCL) GPU Programming
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
matxscript
A high-performance, extensible Python AOT compiler.