Evaluation function for programs
dvberkel opened this issue · comments
The latest example got me thinking about evaluation functions for ast::structure::Program
s.
Observations
With certain chosen starting point and world parameters it takes a particular amount of time to free-fall to the ground. It is exactly 20 iterations in the example. So there is a minimum amount of time you could stay alive by doing nothing.
There is a maximum amount of time to stay alive and still not landing. I.e. thrust straight up until your fuel runs out, And then free-fall to the ground. I have not proven this, but is seems a good strategy, if not for the horrible crash landing at the end. This has the effect of limiting our simulation window. It seems impossible to stay up longer than the above strategy.
Saving fuel seems like a good idea, so it should be rewarded.
Landing sooner also seems like a good idea, so this should be rewarded as well.
Considerations
With the above observations it seems a good idea to base the evaluation function mainly on time, i.e how quickly can you safely bring the lander down. It should weigh in landing safely of course :-). Saving fuel should be a bonus.
Agreed. We might even penalize height: lower positions are valued more, to encourage the algorithm to decrease its height and avoid the shooting off strategy.
Orrrr, we penalize being "outside" the playing field (to be defined).
All I know is, it helps if the scoring gradient is smooth, to help guide the algorithm to the right solution.
Good call. It should be possible to monotonically decrease to a safe landing.
Smooth is not going to be a problem. We can always smooth out discrete steps.