karpathy / karpathy.github.io

my blog

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Include short explanation of "normalization" in hacker's guide

iandanforth opened this issue · comments

This line: " just normalizes this change by the tweak amount we used."

This is a great example of where the word "just" is a "tutorial smell." Normalization is an extremely important concept that is not totally intuitive. Why do you want to keep the a range between 0 and 1? Or 1 and 100? or 0 and 255? What are the conventions in this domain and why were they established?

A few words here on the topic could help comprehension greatly I believe.

Thanks for the comment! I think in case we have an arbitrary parameter "h" that we chose as step size, so it makes sense to divide by it. That way you get the same answer when choosing any h, but of course you want it to be as small as possible without running into numerical issues, since derivative is defined as the limit as h->0. I could add some more discussion surrounding this, I suppose.

Thanks!