titu1994 / neural-architecture-search

neural-architecture-search/controller.py

Line 364 in d5f5c9d

    
           ce_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=classifier, labels=labels)

HI @titu1994 greak work on the implementation, I just have one question about the loss function.
Based on the REINFORCE algorithm and the original paper, the log likelihood of actions are sampled from m models generated using the same policy network. However, in your implementation it seems to me you only sampled a model and calculated the cross entropy between the raw output logits and the input state. What does that mean?

The paper lacked a lot of implementation details, and therefore I had to make a few calls on how the code was written. I've stated in the readme that this is a best effort version of the paper, and does not follow it closely.

As to why I used just 1 sampled policy rather than m, it is to preserve computational cost for training. This also makes my implementation much more unstable.

@titu1994 Thanks for your quick reply.

Loss function