TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The gemm kernel does not use swizzled shared memory layout.

haruhi55 opened this issue · comments

using SmemLayoutAtom = cute::Layout<Shape<_8, _32>, Stride<_32, _1>>;
using SmemLayoutA =
decltype(tile_to_shape(SmemLayoutAtom{}, Shape<Int<kTM>, Int<kTK>>{}));
using SmemLayoutB =
decltype(tile_to_shape(SmemLayoutAtom{}, Shape<Int<kTN>, Int<kTK>>{}));

The GEMM kernel does not utilize swizzled shared memory.