Human-Level Control through Deep Reinforcement Learning

This implementation contains:

Deep Q-network and Q-learning
Experience replay memory
- to reduce the correlations between consecutive updates
Network for Q-learnig targets are fixed for intervals
- to reduce the correlations between target and predicted Q-values

Requirements

First, install prerequisites with:

$ pip install tqdm gym[all]

To train a model for Breakout:

$ python main.py --env_name=Breakout-v0 --is_train=True
$ python main.py --env_name=Breakout-v0 --is_train=True --display=True

To test and record the screen with gym:

$ python main.py --is_train=False
$ python main.py --is_train=False --display=True

Result of training for 24 hours using GTX 980 ti.

Details of Breakout with model m2(red) for 18 hours using GTX 980 Ti.

(episode/min reward should be episode/average reward. typo)

Details of Breakout with model m1(green), m2(purple), m3(blue) and m4(red) for 15 hours using GTX 980 Ti.

MIT License.

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

MIT License

Language:Python 100.0%