microsoft / mup

maximal update parametrization (µP)

Home Page:https://arxiv.org/abs/2203.03466

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unclear `assert_hidden_size_inf` triggers

dreavjr opened this issue · comments

My code is triggering the "has infinite fan-in and finite fan-out dimensions but is not type MuReadout" assertion on "non-obvious" situations (not the last linear layer of the model):

  • Often it's happening on the *first linear layer of the module when I'm changing the size of the last linear layer;
  • Sometimes it's happening on intermediate attention layers of a module;

What am I doing wrong? Is there a good way to debug those situations?

Okay, I think I finally got it!

I cannot simply apply mup to the individual parameters of a vanilla model/layer/block and expect it to work every time -> sometimes the model/layer/block has to be reparameterized. In particular, all layers in an mlp-like block have to grow or shrink in tandem, except, possibly by the output layer of the model.

I am closing this for now.