Consider the 3x3 world shown in the following figure:
The agent has four actions Up, Down, Right and Left.
The transition model is: 80% of the time the agent goes in the direction it selects; the rest of
the time it moves at right angles to the intended direction. A collision with a wall results in no
movement.