Homework3-Policy Gradient

In this homework, you will use a neural network to learn a parameterize policy that can select action without consulting a value function. A value function may still be used to learn the policy weights, but is not required for action selection.

There are some advantage of the policy-based algorithms:

Policy-based methods also offer useful ways of dealing with continuous action spaces
For some tasks, the policy function is simpler and thus easier to approximate.

Introduction

We will use CartPole-v0 as environment in this homework. The following gif is the visualization of the CartPole:

For further description, please see here

Setup

Python 3.5.3
OpenAI gym
tensorflow
numpy
matplotlib
ipython

We encourage you to install Anaconda or Miniconda in your laptop to avoid tedious dependencies problem.

for lazy people:

conda env create -f environment.yml
source activate cedl
# deactivate when you want to leave the environment
source deactivate cedl

TODO

[60%] Problem 1,2,3: Policy gradient
[20%] Problem 5: Baseline bootstrapping
[10%] Problem 6: Generalized Advantage Estimation
- for lazy person, you can refer to here
[10%] Report
[5%] Bonus, share you code and what you learn on github or yourpersonal blogs, such as this

Other

Deadline: 11/2 23:59, 2017
Some of the codes are credited to Yen-Chen Lin 😄
Office hour 2-3 pm in 資電館711 with Yuan-Hong Liao.
Contact andrewliao11@gmail.com for bugs report or any questions.

About

The homework for Cutting-Edge of Deep Learning, aka CEDL, from NTHU

https://thecedl.github.io

Languages

Language:Jupyter Notebook 96.2%Language:Python 3.8%