Removing max_steps in pretrain
rasbt opened this issue · comments
As commented in args.py
:
# TODO: `pretrain` is the only script using `max_tokens` explicitly. replace it with epoch_size*epochs?
I also wonder if max_steps
is maybe sufficient here? I do think max tokens could be useful though if we want to reproduce certain models. E.g., I am thinking of the TinyLlama models. For now, we can maybe keep it, but something to discuss.
I'd like to keep max_tokens
for pretraining if possible. It is not intuitive and error prone to calculate max steps if I know max tokens. Also in LLM training max steps are somewhat irrelevant because the batch sizes are so large and scale up and down a lot based on the model size, number of GPUs/machines, that the number of steps just aren't an intuitive metric in my opinion.
In pretrain.py
, max_steps is currently "unsupported":
Lines 381 to 383 in f951f93
Which I think is for the best for now, what do you think?
This sounds reasonable. And the max_steps
being unsupported sounds good!