Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Removing max_steps in pretrain

rasbt opened this issue · comments

As commented in args.py:

# TODO: `pretrain` is the only script using `max_tokens` explicitly. replace it with epoch_size*epochs?

I also wonder if max_steps is maybe sufficient here? I do think max tokens could be useful though if we want to reproduce certain models. E.g., I am thinking of the TinyLlama models. For now, we can maybe keep it, but something to discuss.

I'd like to keep max_tokens for pretraining if possible. It is not intuitive and error prone to calculate max steps if I know max tokens. Also in LLM training max steps are somewhat irrelevant because the batch sizes are so large and scale up and down a lot based on the model size, number of GPUs/machines, that the number of steps just aren't an intuitive metric in my opinion.

In pretrain.py, max_steps is currently "unsupported":

litgpt/litgpt/pretrain.py

Lines 381 to 383 in f951f93

unsupported = [
(train, ["max_steps", "epochs"]),
(eval, ["max_new_tokens"]),

Which I think is for the best for now, what do you think?

This sounds reasonable. And the max_steps being unsupported sounds good!