Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong epoch number on last line

rasbt opened this issue · comments

The epoch number is increased in the last line before the training finishes so that it is no longer correct. It's a problem in all finetuning scripts:

Epoch 4 | iter 961 step 961 | loss train: 1.062, val: 1.057 | iter time: 529.46 ms (step)
Epoch 4 | iter 962 step 962 | loss train: 0.937, val: 1.057 | iter time: 503.53 ms (step)
Epoch 4 | iter 963 step 963 | loss train: 0.971, val: 1.057 | iter time: 522.10 ms (step)
Epoch 4 | iter 964 step 964 | loss train: 0.902, val: 1.057 | iter time: 115.27 ms (step)
Epoch 5 | iter 965 step 965 | loss train: 1.182, val: 1.057 | iter time: 743.31 ms (step)
Training time: 583.36s
Memory used: 14.49 GB
Saving LoRA weights to 'out/finetune/lora-tiny-llama-1.1b/final/lit_model.pth.lora'