[Suggestion] Add a note about the training of Bengio et al. MLP

Question

[Suggestion] Add a note about the training of Bengio et al. MLP

OmriKaduri opened this issue 2 years ago · comments

Hi @karpathy, thanks for that great repo!

Maybe it would be better to note in your code that while you're training by minimizing the CE loss, Bengio actually maximized the log-likelihood. I know that it is equivalent in this case (one-hot vectors as ground-truth), but that's not the case in general, so maybe better to note. Thanks!