deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Repository from Github https://github.comdeepseek-ai/DeepGEMMRepository from Github https://github.comdeepseek-ai/DeepGEMM

The replacement of deep_gemm.wgrad_gemm_fp8_fp8_fp32_nt after Add more GPU architectures support (#112)

goldhuang opened this issue · comments

Hello,

Thanks for adding the backward support! But the original backward kernel wgrad_gemm_fp8_fp8_fp32_nt is removed by #112.
Which kernel should I use with latest main branch?

Thanks!

Sorry, we currently remove it and plan to add a faster impl for that later. Maybe in the next two weeks.