MushroomRL / mushroom-rl

Python library for Reinforcement Learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Potential simple regressor for car on the hill FQI example

kishanpb opened this issue · comments

Currently the car on the hill FQI example works with the ExtraTreesRegressor. Is it possible for it to work with much simpler regressors, say GaussianProcessRegressor?

I tried using the sklearn.gaussian_process.GaussianProcessRegressor with kernel 1.0 * RBF(1.0), but the approximator gets stuck in the first optimization update! I'll be grateful for any possible tricks to make it work.

There is nothing preventing mushroom to use any sklearn approximator that expose the fit method, particularly in fqi.

Are you sure that it is stuck? maybe it is just extremely slow the fit of the gaussian process. Try with less samples...

Yeah, large sample size seems to be the issue here. Wonder how random forest is faster!

If one wants to use LinearApproximator class in mushroomrl as the approximator, do we need to wrap it up in Regressor class and then pass it?

I'm not surprised at all that random trees are faster than GPs. It's normal, random trees are simply... trees. There's nothing more simple than that.

The LinearApproximator has to be used exactly as any other mushroom/sklearn approximation. Exactly as in the example.
However, FQI doesn't support features. This makes a simple linear approximator almost useless in this scenario.

In the future, we might want to reintroduce the features directly in the linear approximator. When we designed this approximator, I decided to separate the features, as many times is convenient to use them outside the approximator, but now I partially regret my decision.

If you still want to implement a linear approximator with RBF for FQI, you might want to create an approximator that before applying the linear combination, computes the features...

However, remember that FQI is designed specifically to use tree approximations, due to its algorithmic structure. If you want to use linear approximations, you should try LSPI.