interpreting coord checks
llucid-97 opened this issue · comments
Hi there, I'm working on a flax port of this and I'm trying to use the coord check scripts on a variant of your MLP example to see if I've done it right. I'm struggling to interpret the results though:
The point I'm confused on is the green line in the muP graph step 1: if I understood your paper correctly, this should be a flat line right?
Looking through my code, i can't spot the mistake though, so I must ask, is my assumption about step 1 of the coord check wrong?
Hi! Does the green curve correspond to the last layer? If so, this is expected. It is for a related reason that we recommend initializing the last layer weights to zero.
Aah I see. Yes it is for the last layer. Thanks