miscalculating of head index and series index

Question

miscalculating of head index and series index

dndnda opened this issue 9 months ago · comments

This bug is in /dev/forward/attention_forward.cu, kernel function 'attention_softmax_kernel1':

The shape of 'preatt' and 'att' are (B, NH, T, T), and the total thread size is 'B * NH * T', so the head index 'h' should be:
h = (idx / T) % NH
And the time index 't' should be:
t = idx % T