m_grouped_fp8_gemm_nt_contiguous stuck on matrix shape (1, 1,24576, 1536)

Question

m_grouped_fp8_gemm_nt_contiguous stuck on matrix shape (1, 1,24576, 1536)

lizhiqihhh opened this issue 19 days ago · comments

Hi, there,

I am testing the kernel m_grouped_fp8_gemm_nt_contiguous with [group, m per group, N ,K ] = [1, 1, 24576, 1536] on H200. However, the program is stuck. Could you please advise on how to resolve this?

Many thanks!

pyhan commented 16 days ago

Thanks

Chenggang Zhao · Answer 1 · Fri Aug 29 2025 15:41:36 GMT+0800 (China Standard Time)

cc @zheanxu

Zhean Xu · Answer 2 · Fri Aug 29 2025 16:11:27 GMT+0800 (China Standard Time)

Hi @lizhiqihhh, I tested m_grouped_fp8_gemm_nt_contiguous with the matrix shape [1, 1, 24576, 1536] and found it didn’t get stuck. In addition, this kernel assumes that the m of each group is aligned to 128; otherwise, it could cause low performance or other unexpected issues.