[ Components ] Policy Component needs API-method cleanup and return value cleanup

Question

sven1977 opened this issue 6 years ago · comments

The Policy Component needs some cleanup as its API-methods are becoming less and less organized.

Some API methods are called "...parameters_log_probs". Log probs are only really returned for discrete action spaces, so the suffix "_log_probs" should be removed from the API's name entirely and the log-probs should only be returned for categorical distributions (for all others, these "log_probs" are currently actually log(mean) or log(stddev), ...).
API methods to get the actual log-likelihoods for pdf-type continuous distribution functions, will be better named and organized and the actual log-likelihood returned for a certain action will have the key: "log_likelihood", rather than "log_probs".

Sven Mika · Answer 1 · Thu May 23 2019 03:19:43 GMT+0800 (China Standard Time)

This has been mostly completed in 0.4.1 and 0.4.2.

RLGraphObsoletedError messages will hint, which API-methods have been renamed into which new methods.

Closing this issue.