Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

it seems that hash of traindata is lost, so it's impossible to continue finetune after stop

drazdra opened this issue · comments

upon every restart of finetune i see:
"train data seems to have changed. restarting shuffled epoch."

i looked up where it happens, added debugging line and it turned out that
train->shuffle_samples_hash returns zero.

so, it's either not being stored or not being read. so, finetuning always starts from scratch and can not be continued.

at least that's how it is for me :).