[ Components ] Policy Component needs API-method cleanup and return value cleanup
sven1977 opened this issue · comments
Sven Mika commented
The Policy Component needs some cleanup as its API-methods are becoming less and less organized.
- Some API methods are called "...parameters_log_probs". Log probs are only really returned for discrete action spaces, so the suffix "_log_probs" should be removed from the API's name entirely and the log-probs should only be returned for categorical distributions (for all others, these "log_probs" are currently actually log(mean) or log(stddev), ...).
- API methods to get the actual log-likelihoods for pdf-type continuous distribution functions, will be better named and organized and the actual log-likelihood returned for a certain action will have the key: "log_likelihood", rather than "log_probs".
Sven Mika commented
This has been mostly completed in 0.4.1 and 0.4.2.
RLGraphObsoletedError messages will hint, which API-methods have been renamed into which new methods.
Closing this issue.