levitation/dqn

TensorFlow & Keras implementation of DQN with HER (Hindsight Experience Replay)

Hardware

If TensorFlow finds a GPU you will see Creating TensorFlow device (/device:GPU:0) in the beginning of log and the code will use 1 GPU + 1 CPU. If it doesn't find a GPU, it will use 1 CPU.

Tesla K40 + Intel i5 Haswell give about 80 steps/s during training. 1M training + 200k evaluation steps (20k evaluation steps every 100k training steps) takes about 3.5 hours with K40.

I'd recommend about 10 GB of RAM to safely train. REPLAY_BUFFER_SIZE = 100000 and stacking 4 frames in the observation already uses 84 * 84 * 4 * 100000 = 2.6 GB RAM.

Install

Clone this repo: git clone https://github.com/AdamStelmaszczyk/dqn.git.
Install conda for dependency management.
Create dqn conda environment: conda create -n dqn python=3.5.2 -y.
Activate dqn conda environment: source activate dqn. All the following commands should be run in the activated dqn environment.
Install basic dependencies: pip install -r requirements.txt.
If you wish to use a GPU: pip install -r requirements-gpu.txt. Also use correct CUDA and cuDNN versions. For TensorFlow 1.4, this is CUDA 8 or 9 and cudNN 6 or 7.

There is an automatic build on Travis which does the same.

Uninstall

Deactivate conda environment: source deactivate.
Remove dqn conda environment: conda env remove -n dqn -y.

Usage

Basic command is neptune run --offline, which runs run.py locally. For all the possible flags check neptune.yaml.

Train

neptune run --offline -- --env Pong

There are 60 games you can choose from:

AirRaid, Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Carnival, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, ElevatorAction, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, JourneyEscape, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon

Play using the same observations as DQN

neptune run --offline -- --play True

Keys:

W - up
S - down
A - left
D - right
SPACE - fire button (concrete action depends on a game)

Generate GIFs

Generate images: neptune run --offline -- --images True --model PONG_MODEL.h5 --env Pong.
We will use convert tool, which is part of ImageMagick, here are the installation instructions.
Convert images from episode 1 to GIF: convert -layers optimize-frame 1_*.png 1.gif

levitation / dqn

Hardware

Install

Uninstall

Usage

Train

Play using the same observations as DQN

Generate GIFs

Best scores observed using the same hyperparameters as in the code

Pong: 21 after 0.5M steps

Breakout: 419 after 2M steps

SpaceInvaders: 1370 after 6.5M steps

BeamRider: 7111 after 5.5M steps

Seaquest: 8040 after 6.5M steps

Links

About

Languages