Wrong epoch number on last line

Question

Wrong epoch number on last line

rasbt opened this issue 3 months ago · comments

Sebastian Raschka commented 3 months ago

The epoch number is increased in the last line before the training finishes so that it is no longer correct. It's a problem in all finetuning scripts:

Epoch 4 | iter 961 step 961 | loss train: 1.062, val: 1.057 | iter time: 529.46 ms (step)
Epoch 4 | iter 962 step 962 | loss train: 0.937, val: 1.057 | iter time: 503.53 ms (step)
Epoch 4 | iter 963 step 963 | loss train: 0.971, val: 1.057 | iter time: 522.10 ms (step)
Epoch 4 | iter 964 step 964 | loss train: 0.902, val: 1.057 | iter time: 115.27 ms (step)
Epoch 5 | iter 965 step 965 | loss train: 1.182, val: 1.057 | iter time: 743.31 ms (step)
Training time: 583.36s
Memory used: 14.49 GB
Saving LoRA weights to 'out/finetune/lora-tiny-llama-1.1b/final/lit_model.pth.lora'