deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Repository from Github https://github.comdeepseek-ai/DeepGEMMRepository from Github https://github.comdeepseek-ai/DeepGEMM

m_grouped_fp8_gemm_nt_contiguous stuck on matrix shape (1, 1,24576, 1536)

lizhiqihhh opened this issue · comments

commented

Hi, there,

I am testing the kernel m_grouped_fp8_gemm_nt_contiguous with [group, m per group, N ,K ] = [1, 1, 24576, 1536] on H200. However, the program is stuck. Could you please advise on how to resolve this?

Many thanks!

Hi @lizhiqihhh, I tested m_grouped_fp8_gemm_nt_contiguous with the matrix shape [1, 1, 24576, 1536] and found it didn’t get stuck. In addition, this kernel assumes that the m of each group is aligned to 128; otherwise, it could cause low performance or other unexpected issues.

commented

Thanks