chenxin061 / pdarts

Codes for our paper "Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation"

In arch search process , why use SGD for operation weight and Adam for arch_params ?

JarveeLee opened this issue 5 years ago · comments

Li Jiahui commented 5 years ago

chenxin061 commented 5 years ago

ADAM enables adaptive learning rate.
We follow previous NAS work to use ADAM optimizer to tune arch_params.