karpathy / llm.c

LLM training in simple, raw C/CUDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible bugs in the data loading functions

PeterZhizhin opened this issue · comments

commented

First, we read B*T+1 tokens, but advance the iterator by B*T tokens instead.

Then, there is this if statement:

    if (loader->current_position + (loader->num_processes * B * T + 1) * sizeof(int) > loader->file_size)

Possibly, we should remove the loader->num_processes multiplication here.

We need to verify that this is the way these functions should work.

Hey @PeterZhizhin feel free to close this issue, the +1 is not a bug because it is used only in target when you load a first batch and in the next batch it'll be part of the input and not target, so it's actually fine.