D-X-Y / AutoDL-Projects

Automated deep learning algorithms implemented in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why not keep the gumbel softmax trick during the retraining stage?

d12306 opened this issue · comments

Hello, @D-X-Y , thanks for your implementation, I noticed that NAS papers that employ the Gumbel-softmax trick in the searching stage do not keep the same sampling procedure in the evaluation stage. Why do you keep such an inconsistency between training and evaluation?

Does adding the Gumbel sampling do any bad to the evaluation performance?

Thanks,

Would you mind providing more details? What do you mean by "adding Gumbel sampling in the evaluation stage"? and which paper are you referring to?

@D-X-Y ,I am referring paper such as GDAS, by saying "I noticed that NAS papers that employ the Gumbel-softmax trick in the searching stage do not keep the same sampling procedure in the evaluation stage.", I mean during the search, you sample from the Gumbel-softmax distribution and obtain the weight for different operations. And once you finish training, you use the latest weights of ops and get the architecture (finding the two most probable ops for each edge), however, Gumbel-softmax can be influenced by the random noise so the weights of ops are not deterministic given fixed logits.

That being said, the architecture can be quite different if we repeatly sample from the gumbel-softmax distribution, what is the point that we always use one of them? (which can be regarded as a point estimate)

Thanks and please correct me if anything is wrong.

@d12306 Sorry for the late response and thanks for the clarification.
The goal of differentiable is to learn the distribution of architectures (defined by a set of variables $alpha$). GDAS uses Gumbel-softmax to update this distribution. After searching, as this distribution has been optimized, we regard the op with the highest probability is the best.

Although the weights of ops created by Gumbel is influenced by the random noise, the raw logits are deterministic?

Hi, i am sorry for asking such question because i am new to NAS and have been trying perform grid search over the search space of NAS Bench-201 but couldn't find a way so far. can you give me a head start about it?
I would appreciate any help

@tehseenmayar Thanks for your interest.

We have extended our NAS-Bench-201 to NATS-Bench, which has more architecture information and a more efficient and robust API. I would recommend you use NATS-Bench instead of NAS-Bench-201. If you want to start with

Hi, Thanks for your great work. I have run the following command and got the results.
python ./exps/NATS-algos/random_wo_share.py --dataset cifar100 --search_space sss
can you tell me how to get the final discovered architecture?
I would appreciate your help.
Thanks

image

Can you see something like this in the saved log file? "The best arch is xxx", where "xxx" is the final discovered architecture

Screenshot 2021-01-29 at 12 55 25 PM

Thank you for your response. I am looking at these kinds of information. How to get the informations which you have mentioned above. I need to train the found architecture for accuracy validation.

sorry to bother you again, can you also tell me about how can i retrain the final discovered architecture to find the validation accuracy of that architecture?

thank you

If you are using our NAS-Bench-201 or NATS-Bench (https://xuanyidong.com/assets/projects/NATS-Bench), you do not need to re-train the model. You can directly query the performance of your discovered information via our API.