Implementation of Andrew Ng's paper's feasibility based results for recovering an MDP's rewards given its optimal policy. Showed how regularization can bring down the size of the feasible set by 30% (and increase precision)
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool