[question] nan loss value and run time error
nevermet opened this issue · comments
nevermet commented
Dear all,
I finetuning mydata with open llama. While running finetune/lora.py, I don't see the loss number as follows:
...
iter 3198: loss nan, time: 134.94ms
and while validating, it ends up with an error:
...
File ".../lit-llama/generate.py", line 74, in generate
idx_next = torch.multinomial(probs, num_samples=1).to(dtype=dtype)
RuntimeError: probability tensor contains either inf
, nan
or element < 0
Could you tell me how I can resolve this?
Thanks in advance.