Different batch_size results in different evaluation loss.
iminfine opened this issue · comments
I have modified the code in train_gpt2.py to enable loading a pre-trained model and conveniently obtaining its evaluation loss by inserting a break statement at the beginning of the training loop:
if args.init_from =='eval' :
break
after
Line 762 in 72698a5
I have unexpectedly observed that when the evaluation batch size is varied from the size used during training, the evaluation loss also varies. This observation is counterintuitive. The evaluation loss should remain consistent regardless of the evaluation batch size. Does any one know why?