Reinforcement Learning: Tic Tac Toe

This code is written in PYTHON 3.
Dependencies
-numpy
-random
-termcolor (for printing colored text)

Introduction

Basic Introduction:

This Python code trains a model to play Tic Tac Toe. The model learns to play Tic Tac Toe by playing the game against itself for several thousand times. During these games, the model tries to learn the best moves to take in order to win (Reinforcement Learning). After the model is trained, the user can play Tic Tac Toe against the model.

More Specific Introduction:

The model used is a single neuron, because Tic Tac Toe is a fairly simple game. Training is done using gradient descent. Values are assigned to each state of the game after the game is finished based on the explanation in the book "Machine Learning" by Dr. Tom Mitchell. The features that are used by the model are:
-Number of open paths for the query move with 2 team members
-Number of paths that the query move will block for atleast 1 enemy
-Number of paths that the query move will block for 2 enemies

---- Code Instructions ----

Functions:

-Board_Analysis(board,team):

This function analyses the board and extracts features from it

-Endgame_Check(board):

This function checks if the game has ended, and if so, who has won the game.

-Experiment_Generator():

This function creates the initial board state when the model is playing against itself in the training phase.

-Best_Move(Move_Attributes,Weights):

This function finds the best move to take by choosing the state with the maximum predicted value (forward propagation).

-Actual_Scores_Calc(Board_States, Winner, Weights):

This function calculates the actual values of each state after game has ended.

-Predicted_Scores_Calc(...):

This function calculates the values that our model predicted during the game, in order to calculate the error between the predicted values and the actual values.

-Update_Weight_Values(...):

This function updates the weights of our neuron based on the error.

-Play_and_Learn(Num_Times,Initial_Weights,Learning_Rate,NumIteration):

This function asks the model to play %Num_Times times against itself, while updating the weights %NumIteration times using Gradient Descent after each round of playing. The model updates weights based on the plays of both agents, i.e. both the agent that has lost the game and the agent that has won.

-Computer_Move(board, computer_team, Learned_Weights):

This function causes the model to make a move while playing against a human based on the learned weights in the training phase.

-Human_Move(board, human_team):

This function asks the human to decide on the move to play against the computer.

saryazdi / Reinforcement_Learning-Tic_Tac_Toe