Lightning-AI / litgpt

As commented in args.py:

# TODO: `pretrain` is the only script using `max_tokens` explicitly. replace it with epoch_size*epochs?

I also wonder if max_steps is maybe sufficient here? I do think max tokens could be useful though if we want to reproduce certain models. E.g., I am thinking of the TinyLlama models. For now, we can maybe keep it, but something to discuss.

I'd like to keep max_tokens for pretraining if possible. It is not intuitive and error prone to calculate max steps if I know max tokens. Also in LLM training max steps are somewhat irrelevant because the batch sizes are so large and scale up and down a lot based on the model size, number of GPUs/machines, that the number of steps just aren't an intuitive metric in my opinion.

In pretrain.py, max_steps is currently "unsupported":

litgpt/litgpt/pretrain.py

Lines 381 to 383 in f951f93

    
           unsupported = [ 
        
               (train, ["max_steps", "epochs"]), 
        
               (eval, ["max_new_tokens"]),

Which I think is for the best for now, what do you think?

This sounds reasonable. And the max_steps being unsupported sounds good!

	unsupported = [
	(train, ["max_steps", "epochs"]),
	(eval, ["max_new_tokens"]),

Removing max_steps in pretrain