Exploration strategy

Question

Exploration strategy

adrigrillo opened this issue 6 years ago · comments

Adrián Rodríguez Grillo commented 6 years ago

I have changed the exploration strategy to something similar of what we have talked. Now, when you instantiate an agent the following parameters will can be configured:

init_eps: Initial epsilon. Default:1.0.
min_eps: Minimal epsilon. Default: 0.01.
eps_decay: Number of steps for epsilon convergence to the minimal value since the use of the memory. Default: 500
per_init_eps_memory: percentage of the initial epsilon that will remain when the memory starts to be used. Default: 0.8

So, for now, when the agent is not using the memory a linear decay is used that will go from the init_eps to init_eps * per_init_eps_memory. In the default case from 1.0 to 0.8.
Then, when the memory is used the exponential decay will be used. It will go from init_eps * per_init_eps_memory to min_eps, having as the reference point the step the memory starts to being used.

I have been looking other methods for exploration like the Boltzmann one but I do not thing that for our simple tasks will suppose an improvement and imply greater changes. However, it could be done quickly.