karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

I can not understand the `cublasGemmStridedBatchedEx` call in the `attention_forward`

echosprint opened this issue · comments

https://github.com/karpathy/llm.c/blob/b1171051dd0540519aec7433ac781ff902b2264f/llmc/attention.cuh#L217C34-L217C55

in attention_forward we need compute the Q * K, which is (B, NH, T, HS) row major stored?
but cublasGemmStridedBatchedEx would treat the matrix as (HS, T) because cublas is column major default,
so why we send (T, T, HS) for the (M,N,K) parameters, instead of (HS, T, HS)

    cublasCheck(cublasGemmStridedBatchedEx(cublas_handle,
                                     CUBLAS_OP_T, CUBLAS_OP_N,
                                     T, T, HS, &alpha,
                                     k, CUBLAS_LOWP, HS, T * HS,
                                     q, CUBLAS_LOWP, HS, T * HS,
                                     &beta, preatt, CUBLAS_LOWP, T, T * T,
                                     B * NH, cublas_compute, CUBLAS_GEMM_DEFAULT));

can anyone explain?