There are 0 repository under cuda-core topic.
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.