Questions on the PMGA algorithm

Question

Questions on the PMGA algorithm

shutongcs opened this issue 5 years ago · comments

shutongcs commented 5 years ago

Hi, I'm interested in your PMGA algorithm. But I have some questions when I read the code.

In the GaussianLinearFixedvarDiagmean.m, you define two functions to obtain the derivative of the logarithm of the policy and hessian matrix of the logarithm of the policy, respectively. How do you obtain the exact formulations? I dont find the derivation process in your paper (AAAI'15 and the extension), could you pls provide some refs on these?
In the params_lqr_mo.m (also in the Sec. 6.1 of your Journal of Artificial Intelligence Research (2016)), you give the formulation of \rho in different parameter types, I also wonder about the derivation or refs on this.

Look forward to your reply. Thanks!

Best regards,
Shutong

Simone Parisi · Answer 1 · Mon Jun 03 2019 22:21:21 GMT+0800 (China Standard Time)

Hi Shutong,

It's just the derivative of a Gaussian. We know that pi(a|s) = exp( (mean - a)' Sigma^-1 (mean - a)), where mean = phi(s)*theta (in my code). You just need to derive (once for dlogpi and twice for hlogpi) the logarithm of the policy wrt theta and Sigma and you get the functions in my code. It is pretty straightforward if you follow the rules in the matrix cookbook by Petersen (see pic attached).
The constrained and unconstrained parameterizations are presented in my JAIR paper, Section 6.2.1. We decided to use them because they work well for the LQR. The neural network parameterization is not in the paper, and I just implemented for the sake of curiosity, because Matlab is definitely not a good option for NN and the symbolic toolbox is rather slow.

Best,
Simone

shutongcs · Answer 2 · Tue Jun 04 2019 20:38:05 GMT+0800 (China Standard Time)

Hi Shutong,

1. It's just the derivative of a Gaussian. We know that pi(a|s) = exp( (mean - a)' Sigma^-1 (mean - a)), where mean = phi(s)*theta (in my code). You just need to derive (once for dlogpi and twice for hlogpi) the logarithm of the policy wrt theta and Sigma and you get the functions in my code. It is pretty straightforward if you follow the rules in the matrix cookbook by Petersen (see pic attached).

2. The constrained and unconstrained parameterizations are presented in my JAIR paper, Section 6.2.1. We decided to use them because they work well for the LQR. The neural network parameterization is not in the paper, and I just implemented for the sake of curiosity, because Matlab is definitely not a good option for NN and the symbolic toolbox is rather slow.

Best,
Simone

Hi, Simone

Thanks for your answer!

I also have a question here, I run your PMGA.m and obtain efficient front_learned given by pol_iter. But when I have a certain weight, how can I choose a corresponding policy from the set of candidate policies? (Or, how can I get the weight of the reward which each obtained point of Pareto frontier corresponds to?)

Look forward to your reply. Thanks!

Best regards,
Shutong

Simone Parisi · Answer 3 · Tue Jun 04 2019 20:51:29 GMT+0800 (China Standard Time)

Hi Shutong,

There is no weighting, this is a manifold-based approach. For each J in the front there is a "t" point which generated it. Given "t", you can retrieve the corresponding policy, but you cannot get a weighting over the rewards.

Best,
Simone

shutongcs · Answer 4 · Wed Jun 05 2019 19:02:52 GMT+0800 (China Standard Time)

Hi Shutong,

There is no weighting, this is a manifold-based approach. For each J in the front there is a "t" point which generated it. Given "t", you can retrieve the corresponding policy, but you cannot get a weighting over the rewards.

Best,
Simone

Hi Simone,

Got it with many thanks!

Best regards,
Shutong

Conor Hayes · Answer 5 · Sat Feb 25 2023 15:51:53 GMT+0800 (China Standard Time)

Hi Simone,

Sorry for digging up an old thread.

could you link the textbook where you got these screenshots? I currently cannot find it online!

Thanks!

Simone Parisi · Answer 6 · Sun Feb 26 2023 04:45:38 GMT+0800 (China Standard Time)

@conorfhayes They are from some notes I took for myself.
You can derive those equations using the matrix cookbook.
At the time, I found these notes also useful.