The project was done for the course Intelligent agents
The maze environment consists of four types of tiles:
1. Wall Unreacheble State
2. Green Tile Reward +1
3. Brown Tile Reward -1
4. White Tile Reward -.4
The transition model of the agent is described by:
1.The intended outcome for an action occurs with a probability of 0.8.
2.The agent moves right angle to the intended direction with a probability of 0.1.
There is no terminal state in the maze of the Tile world.
The utility of each state is updated according to the above equation
U(s)= Utility of the states in the itch iteration
R(s)= Reward of the state s
P (s’|s, a)= Probability of reaching state s’, given s and action a.
The utility value of each states is normalized to a maximum of 2. The normalization factor needs to be decided based on the reward. The algorithm gives a decent, acceptabe result when it is run without normalization. In fact in some cases, normalization gives worse results like the maze set up below :