jundaf2 / cutlass-kernel-volta-gemm

volta fp16 gemm kernel

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cutlass-kernel-volta-gemm

This is a cutlass-based kernel-level GEMM for Volta architecture.

Dependencies

  • pytorch
  • pytest

Notes

First deploy blocks, then deploy warps.

  • Block
    • BM = 64
    • BN = 64
  • Warp
    • WM = 16
    • WN = 16 NUM_WARPS = (BMxBN)/(WMxWN) = 16 NUM_THREADS_PER_CTA = WARP_SIZExNUM_WARPS = 32x16 = 512

About

volta fp16 gemm kernel


Languages

Language:Cuda 80.0%Language:Python 18.6%Language:C++ 1.3%