Learning rate of nn.Linear

Question

Learning rate of nn.Linear

doubbblek opened this issue 3 years ago · comments

Thanks for open source such great work.
I notice that all the learning rate of linear layers are x5, even in all the temporal adaptive module. I know that normally for the last fully connected layer, larger learning rate would bring better performance. Is this a mistake? Or it can produce better result?

zyLiu · Answer 1 · Tue Nov 09 2021 19:11:36 GMT+0800 (China Standard Time)

This behavior is borrowed from TSN and we normally use the default setting when training TANet.
In my experience, the learning rate of linear layers x5 has little impact on performance. MMaction2 reproduces our work without using specicial learning rate schedule for linear layers and even achieves better performance. Hope it can solve your confusions.

doubbblek · Answer 2 · Tue Nov 09 2021 19:17:07 GMT+0800 (China Standard Time)

Thanks for the reply.