Get GPT loss to decrease to 0 for single batch

Question

Get GPT loss to decrease to 0 for single batch

bclarkson-code opened this issue 3 months ago · comments

To make sure that everything is working, we should be able to drop the loss to 0 on a single batch for the model. If it doesn't then there are some bugs that need fixing