karpathy / llm.c

I have modified the code in train_gpt2.py to enable loading a pre-trained model and conveniently obtaining its evaluation loss by inserting a break statement at the beginning of the training loop:

 if args.init_from =='eval' :  
    break

after

llm.c/train_gpt2.py

Line 762 in 72698a5

print0(f"val loss {val_loss}")

I have unexpectedly observed that when the evaluation batch size is varied from the size used during training, the evaluation loss also varies. This observation is counterintuitive. The evaluation loss should remain consistent regardless of the evaluation batch size. Does any one know why?

Different batch_size results in different evaluation loss.