Implementation of the following Policy Gradient Algorithms - Reinforce and Actor Critic. Some parts adapted from https://github.com/andrecianflone/rl_at_ammi .
Example:
conda create -n policy_grad
pip install -r requirements.txt
sudo apt-get install ffmpeg
Without baseline
python reinforce.py
With baseline
python reinforce_with_learned_baseline.py
Run Training from Scratch
python actor_critic.py
Use a trained agent
python actor_critic.py --load-model