A simple implementation of Q-learning for Tic Tac Toe.
Q-learning parameters:
Alpha = 0.9, Learning rate
Gamma = 1.0, Discount rate
Epsilon = 0.8, Probability for random moves
Converges in approximately 50 000 training games.
A simple implementation of Q-learning for Tic Tac Toe
A simple implementation of Q-learning for Tic Tac Toe.
Q-learning parameters:
Alpha = 0.9, Learning rate
Gamma = 1.0, Discount rate
Epsilon = 0.8, Probability for random moves
Converges in approximately 50 000 training games.
A simple implementation of Q-learning for Tic Tac Toe