karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

Different batch_size results in different evaluation loss.

iminfine opened this issue · comments

I have modified the code in train_gpt2.py to enable loading a pre-trained model and conveniently obtaining its evaluation loss by inserting a break statement at the beginning of the training loop:

 if args.init_from =='eval' :  
    break

after

print0(f"val loss {val_loss}")

I have unexpectedly observed that when the evaluation batch size is varied from the size used during training, the evaluation loss also varies. This observation is counterintuitive. The evaluation loss should remain consistent regardless of the evaluation batch size. Does any one know why?