Exploration strategy
adrigrillo opened this issue · comments
I have changed the exploration strategy to something similar of what we have talked. Now, when you instantiate an agent the following parameters will can be configured:
init_eps
: Initial epsilon. Default:1.0.min_eps
: Minimal epsilon. Default: 0.01.eps_decay
: Number of steps for epsilon convergence to the minimal value since the use of the memory. Default: 500per_init_eps_memory
: percentage of the initial epsilon that will remain when the memory starts to be used. Default: 0.8
So, for now, when the agent is not using the memory a linear decay is used that will go from the init_eps
to init_eps * per_init_eps_memory
. In the default case from 1.0
to 0.8
.
Then, when the memory is used the exponential decay will be used. It will go from init_eps * per_init_eps_memory
to min_eps
, having as the reference point the step the memory starts to being used.
I have been looking other methods for exploration like the Boltzmann one but I do not thing that for our simple tasks will suppose an improvement and imply greater changes. However, it could be done quickly.