karpathy / makemore

An autoregressive character-level language model for making more things

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Suggestion] Add a note about the training of Bengio et al. MLP

OmriKaduri opened this issue · comments

Hi @karpathy, thanks for that great repo!

Maybe it would be better to note in your code that while you're training by minimizing the CE loss, Bengio actually maximized the log-likelihood. I know that it is equivalent in this case (one-hot vectors as ground-truth), but that's not the case in general, so maybe better to note. Thanks!