Unclear `assert_hidden_size_inf` triggers
dreavjr opened this issue · comments
Eduardo Valle commented
My code is triggering the "has infinite fan-in and finite fan-out dimensions but is not type MuReadout
" assertion on "non-obvious" situations (not the last linear layer of the model):
- Often it's happening on the *first linear layer of the module when I'm changing the size of the last linear layer;
- Sometimes it's happening on intermediate attention layers of a module;
What am I doing wrong? Is there a good way to debug those situations?
Eduardo Valle commented
Okay, I think I finally got it!
I cannot simply apply mup to the individual parameters of a vanilla model/layer/block and expect it to work every time -> sometimes the model/layer/block has to be reparameterized. In particular, all layers in an mlp-like block have to grow or shrink in tandem, except, possibly by the output layer of the model.
I am closing this for now.