Loss function
dxywill opened this issue · comments
neural-architecture-search/controller.py
Line 364 in d5f5c9d
HI @titu1994 greak work on the implementation, I just have one question about the loss function.
Based on the REINFORCE algorithm and the original paper, the log likelihood of actions are sampled from m models generated using the same policy network. However, in your implementation it seems to me you only sampled a model and calculated the cross entropy between the raw output logits and the input state. What does that mean?
The paper lacked a lot of implementation details, and therefore I had to make a few calls on how the code was written. I've stated in the readme that this is a best effort version of the paper, and does not follow it closely.
As to why I used just 1 sampled policy rather than m, it is to preserve computational cost for training. This also makes my implementation much more unstable.
@titu1994 Thanks for your quick reply.