inf loss at big batch
karpathy opened this issue · comments
just creating a todo. large batch sizes work now having fixed the size_t bug:
./train_gpt2cu -b 36 -v 200 -s 200 -i data/TinyStories
works, but 48 should fit but doesn't work
./train_gpt2cu -b 48 -v 200 -s 200 -i data/TinyStories
val loss is -nan and train loss stays at inf.
todo track down why and how to prevent