liu-zhy / temporal-adaptive-module

TAM: Temporal Adaptive Module for Video Recognition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Learning rate of nn.Linear

doubbblek opened this issue · comments

Thanks for open source such great work.
I notice that all the learning rate of linear layers are x5, even in all the temporal adaptive module. I know that normally for the last fully connected layer, larger learning rate would bring better performance. Is this a mistake? Or it can produce better result?

commented

This behavior is borrowed from TSN and we normally use the default setting when training TANet.
In my experience, the learning rate of linear layers x5 has little impact on performance. MMaction2 reproduces our work without using specicial learning rate schedule for linear layers and even achieves better performance. Hope it can solve your confusions.

Thanks for the reply.