Why not keep the gumbel softmax trick during the retraining stage?

Question

Why not keep the gumbel softmax trick during the retraining stage?

d12306 opened this issue 4 years ago · comments

Hello, @D-X-Y , thanks for your implementation, I noticed that NAS papers that employ the Gumbel-softmax trick in the searching stage do not keep the same sampling procedure in the evaluation stage. Why do you keep such an inconsistency between training and evaluation?

Does adding the Gumbel sampling do any bad to the evaluation performance?

Thanks,

Xuanyi Dong · Answer 1 · Fri Dec 11 2020 18:08:50 GMT+0800 (China Standard Time)

Would you mind providing more details? What do you mean by "adding Gumbel sampling in the evaluation stage"? and which paper are you referring to?

Xuefeng Du · Answer 2 · Fri Dec 11 2020 19:04:20 GMT+0800 (China Standard Time)

@D-X-Y ,I am referring paper such as GDAS, by saying "I noticed that NAS papers that employ the Gumbel-softmax trick in the searching stage do not keep the same sampling procedure in the evaluation stage.", I mean during the search, you sample from the Gumbel-softmax distribution and obtain the weight for different operations. And once you finish training, you use the latest weights of ops and get the architecture (finding the two most probable ops for each edge), however, Gumbel-softmax can be influenced by the random noise so the weights of ops are not deterministic given fixed logits.

That being said, the architecture can be quite different if we repeatly sample from the gumbel-softmax distribution, what is the point that we always use one of them? (which can be regarded as a point estimate)

Thanks and please correct me if anything is wrong.

Xuanyi Dong · Answer 3 · Fri Dec 11 2020 23:06:11 GMT+0800 (China Standard Time)

@d12306 Sorry for the late response and thanks for the clarification.
The goal of differentiable is to learn the distribution of architectures (defined by a set of variables $alpha$). GDAS uses Gumbel-softmax to update this distribution. After searching, as this distribution has been optimized, we regard the op with the highest probability is the best.

Although the weights of ops created by Gumbel is influenced by the random noise, the raw logits are deterministic?

tehseenmayar · Answer 4 · Sun Dec 27 2020 03:25:35 GMT+0800 (China Standard Time)

Hi, i am sorry for asking such question because i am new to NAS and have been trying perform grid search over the search space of NAS Bench-201 but couldn't find a way so far. can you give me a head start about it?
I would appreciate any help

Xuanyi Dong · Answer 5 · Sun Dec 27 2020 10:33:58 GMT+0800 (China Standard Time)

@tehseenmayar Thanks for your interest.

We have extended our NAS-Bench-201 to NATS-Bench, which has more architecture information and a more efficient and robust API. I would recommend you use NATS-Bench instead of NAS-Bench-201. If you want to start with

random search, you can try this script: https://github.com/D-X-Y/AutoDL-Projects/blob/master/exps/NATS-algos/random_wo_share.py#L6
REINFORCE, you can try this: https://github.com/D-X-Y/AutoDL-Projects/blob/master/exps/NATS-algos/reinforce.py#L6
unfortunately, we did not provide codes for grid search, while it should be straightforward to modify from random search codes.

tehseenmayar · Answer 6 · Fri Jan 29 2021 06:08:26 GMT+0800 (China Standard Time)

Hi, Thanks for your great work. I have run the following command and got the results.
python ./exps/NATS-algos/random_wo_share.py --dataset cifar100 --search_space sss
can you tell me how to get the final discovered architecture?
I would appreciate your help.
Thanks

Xuanyi Dong · Answer 7 · Fri Jan 29 2021 13:46:13 GMT+0800 (China Standard Time)

Can you see something like this in the saved log file? "The best arch is xxx", where "xxx" is the final discovered architecture

tehseenmayar · Answer 8 · Fri Jan 29 2021 18:03:26 GMT+0800 (China Standard Time)

Thank you for your response. I am looking at these kinds of information. How to get the informations which you have mentioned above. I need to train the found architecture for accuracy validation.

tehseenmayar · Answer 9 · Fri Jan 29 2021 19:04:57 GMT+0800 (China Standard Time)

sorry to bother you again, can you also tell me about how can i retrain the final discovered architecture to find the validation accuracy of that architecture?

thank you

Xuanyi Dong · Answer 10 · Sat Jan 30 2021 12:37:03 GMT+0800 (China Standard Time)

If you are using our NAS-Bench-201 or NATS-Bench (https://xuanyidong.com/assets/projects/NATS-Bench), you do not need to re-train the model. You can directly query the performance of your discovered information via our API.