m_grouped_fp8_gemm_nt_contiguous stuck on matrix shape (1, 1,24576, 1536)
lizhiqihhh opened this issue · comments
Hi, there,
I am testing the kernel m_grouped_fp8_gemm_nt_contiguous with [group, m per group, N ,K ] = [1, 1, 24576, 1536] on H200. However, the program is stuck. Could you please advise on how to resolve this?
Many thanks!
cc @zheanxu
Hi @lizhiqihhh, I tested m_grouped_fp8_gemm_nt_contiguous with the matrix shape [1, 1, 24576, 1536] and found it didn’t get stuck. In addition, this kernel assumes that the m of each group is aligned to 128; otherwise, it could cause low performance or other unexpected issues.
Thanks