seungeunrho / minimalRL

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

torch.gather in relevant to policy gradient

migom6 opened this issue Β· comments

As from my understanding the policy network is giving an output of mean and variance for a single action. After that torch.gather is used to calculate the log_prob. Can someone help me to understand the process?
Thanks for the help. πŸ˜ƒ