karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

miscalculating of head index and series index

dndnda opened this issue · comments

commented

This bug is in /dev/forward/attention_forward.cu, kernel function 'attention_softmax_kernel1':

Image

The shape of 'preatt' and 'att' are (B, NH, T, T), and the total thread size is 'B * NH * T', so the head index 'h' should be:
h = (idx / T) % NH
And the time index 't' should be:
t = idx % T