sparisi / mips

Minimal Policy Search Toolbox

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions on the PMGA algorithm

shutongcs opened this issue · comments

Hi, I'm interested in your PMGA algorithm. But I have some questions when I read the code.

  1. In the GaussianLinearFixedvarDiagmean.m, you define two functions to obtain the derivative of the logarithm of the policy and hessian matrix of the logarithm of the policy, respectively. How do you obtain the exact formulations? I dont find the derivation process in your paper (AAAI'15 and the extension), could you pls provide some refs on these?

  2. In the params_lqr_mo.m (also in the Sec. 6.1 of your Journal of Artificial Intelligence Research (2016)), you give the formulation of \rho in different parameter types, I also wonder about the derivation or refs on this.

Look forward to your reply. Thanks!

Best regards,
Shutong

Hi Shutong,

  1. It's just the derivative of a Gaussian. We know that pi(a|s) = exp( (mean - a)' Sigma^-1 (mean - a)), where mean = phi(s)*theta (in my code). You just need to derive (once for dlogpi and twice for hlogpi) the logarithm of the policy wrt theta and Sigma and you get the functions in my code. It is pretty straightforward if you follow the rules in the matrix cookbook by Petersen (see pic attached).

  2. The constrained and unconstrained parameterizations are presented in my JAIR paper, Section 6.2.1. We decided to use them because they work well for the LQR. The neural network parameterization is not in the paper, and I just implemented for the sake of curiosity, because Matlab is definitely not a good option for NN and the symbolic toolbox is rather slow.

Best,
Simone

dlogpi

Hi Shutong,

1. It's just the derivative of a Gaussian. We know that pi(a|s) = exp( (mean - a)' Sigma^-1 (mean - a)), where mean = phi(s)*theta (in my code). You just need to derive (once for dlogpi and twice for hlogpi) the logarithm of the policy wrt theta and Sigma and you get the functions in my code. It is pretty straightforward if you follow the rules in the matrix cookbook by Petersen (see pic attached).

2. The constrained and unconstrained parameterizations are presented in my JAIR paper, Section 6.2.1. We decided to use them because they work well for the LQR. The neural network parameterization is not in the paper, and I just implemented for the sake of curiosity, because Matlab is definitely not a good option for NN and the symbolic toolbox is rather slow.

Best,
Simone

dlogpi

Hi, Simone

Thanks for your answer!

I also have a question here, I run your PMGA.m and obtain efficient front_learned given by pol_iter. But when I have a certain weight, how can I choose a corresponding policy from the set of candidate policies? (Or, how can I get the weight of the reward which each obtained point of Pareto frontier corresponds to?)

Look forward to your reply. Thanks!

Best regards,
Shutong

Hi Shutong,

There is no weighting, this is a manifold-based approach. For each J in the front there is a "t" point which generated it. Given "t", you can retrieve the corresponding policy, but you cannot get a weighting over the rewards.

Best,
Simone

Hi Shutong,

There is no weighting, this is a manifold-based approach. For each J in the front there is a "t" point which generated it. Given "t", you can retrieve the corresponding policy, but you cannot get a weighting over the rewards.

Best,
Simone

Hi Simone,

Got it with many thanks!

Best regards,
Shutong

Hi Simone,

Sorry for digging up an old thread.

could you link the textbook where you got these screenshots? I currently cannot find it online!

Thanks!

@conorfhayes They are from some notes I took for myself.
You can derive those equations using the matrix cookbook.
At the time, I found these notes also useful.