torch.gather in relevant to policy gradient
migom6 opened this issue Β· comments
As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone help me to understand the process?
Thanks for the help. π