naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing need to be corrected

ruirui3364 opened this issue · comments

In "using dot product of complex numbers to rotate a vector" section, the code "freqs_cis.shape" doesn't have its output, which may make reader confused.

In "multi head attention" section, the line "qkv_attention = torch.matmul(qk_per_token_after_masking_after_softmax, v_per_token)" repeat twice.

Thank you.