How to determine `warmup_tokens` and `final_tokens`?
fgolemo opened this issue · comments
Florian Golemo commented
Hey folks,
Thanks a lot for this implementation @karpathy! I was wondering how you got the values in the addition example:
warmup_tokens=1024,
final_tokens=50 * len(train_dataset) * (ndigit + 1),
And how does one estimate these for a different task (i.e. based on vocabulary, epochs, etc)?
Cheers,
Florian