Simple reinforcement learning algorithms implemented for CartPole on OpenAI gym.
This code goes along with my post about learning CartPole, which is inspired by an OpenAI request for research.
##Algorithms implemented
Random Search: Keep trying random weights between [-1,1] and greedily keep the best set.
Hill climbing: Start from a random initialization, add a little noise evey iteration and keep the new set if it improved.
Policy gradient Use a softmax policy and compute a value function using discounted Monte-Carlo. Update the policy to favor action-state pairs that return a higher total reward than the average total reward of that state. Read my post about learning CartPole for a better explanation of this.