KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential Problem in the Naive GEMM

MyNewAcc1234 opened this issue · comments

Hello,

I think there should be a __syncthreads() before "storeAccum(SC, Accum);". Otherwise, because of the shared memory reuse between A/B and C, one warp may read a position that has been overwritten by other warps.

Although this tile size may not produce a wrong result, I produce inf and nan when I increase the tile size of K dimension from 32 to 64. When the K is large, the synchronization among warps will be significant thus overwriting.

(By the way, I have modified both the tile of K and the function loadSmemA and loadSmemB with 128bit load, and then I have the inf and nan in my result. I check my code many times and then try to add this __syncthreads(). Then I get the right result. So, actually, I'm not sure the inf and nan exactly come from the lack of __syncthreads().)