An RL agent that plays tic tac toe using epsilon greedy selection and monte carlo updates.
The training is done using self-play and over 20000 episodes of the game.
- Numpy
An RL agent that plays tic tac toe using epsilon greedy selection and monte carlo updates
An RL agent that plays tic tac toe using epsilon greedy selection and monte carlo updates.
The training is done using self-play and over 20000 episodes of the game.
An RL agent that plays tic tac toe using epsilon greedy selection and monte carlo updates