Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[question] nan loss value and run time error

nevermet opened this issue · comments

Dear all,

I finetuning mydata with open llama. While running finetune/lora.py, I don't see the loss number as follows:
...
iter 3198: loss nan, time: 134.94ms

and while validating, it ends up with an error:
...
File ".../lit-llama/generate.py", line 74, in generate
idx_next = torch.multinomial(probs, num_samples=1).to(dtype=dtype)
RuntimeError: probability tensor contains either inf, nan or element < 0

Could you tell me how I can resolve this?

Thanks in advance.