In arch search process , why use SGD for operation weight and Adam for arch_params ?
JarveeLee opened this issue · comments
- ADAM enables adaptive learning rate.
- We follow previous NAS work to use ADAM optimizer to tune arch_params.
Codes for our paper "Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation"
JarveeLee opened this issue · comments