How to determine `warmup_tokens` and `final_tokens`?

Question

fgolemo opened this issue 2 years ago · comments

Hey folks,

Thanks a lot for this implementation @karpathy! I was wondering how you got the values in the addition example:

warmup_tokens=1024,
final_tokens=50 * len(train_dataset) * (ndigit + 1),

And how does one estimate these for a different task (i.e. based on vocabulary, epochs, etc)?

Cheers,
Florian