Are parameters with no "infinite" dimensions allowed?

Question

Are parameters with no "infinite" dimensions allowed?

callumm-graphcore opened this issue 2 years ago · comments

Hi,

Is it valid to have parameters that have no "infinite" dimensions? This line suggests that it is, but I can't find anything in the paper that explains how this case should be dealt with.

With thanks,
Callum

Edward Hu · Answer 1 · Mon Nov 28 2022 22:34:48 GMT+0800 (China Standard Time)

Hi Callum,

Yes, it's possible to have parameters with only finite dimensions. For example, given a finite output dimension d_out, the bias vector for the last layer will have dimension 1 x d_out.

Callum · Answer 2 · Mon Nov 28 2022 22:39:53 GMT+0800 (China Standard Time)

Thanks Edward! Is there a part of the paper that explains what the correct scaling is in this case? Would this apply even if you had a linear layer where neither the input nor the output dimension was scaled?

Edward Hu · Answer 3 · Mon Nov 28 2022 23:20:18 GMT+0800 (China Standard Time)

The bias example I gave is covered under input weights & biases in Table 3, 8, and 9, and it has a constant init and LR.

Yes, it also applies when you have a linear layer. We might not have talked about it specifically in the paper since it's less common, but you should use a constant init and LR.

Callum · Answer 4 · Mon Nov 28 2022 23:26:25 GMT+0800 (China Standard Time)

Ah, OK, I see now. Thank you very much!

Greg Yang · Answer 5 · Tue Nov 29 2022 01:24:57 GMT+0800 (China Standard Time)

Yes that is allowed.

…

On Mon, Nov 28, 2022, 9:26 AM Callum ***@***.***> wrote: Ah, OK, I see now. Thank you very much! — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMWHHM6NGFEYMSIZ5QOAHRDWKTFK5ANCNFSM6AAAAAASNLDV2I> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>