levitation / dqn

TensorFlow & Keras implementation of DQN with HER (Hindsight Experience Replay)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status

TensorFlow & Keras implementation of DQN with HER (Hindsight Experience Replay)

Hardware

If TensorFlow finds a GPU you will see Creating TensorFlow device (/device:GPU:0) in the beginning of log and the code will use 1 GPU + 1 CPU. If it doesn't find a GPU, it will use 1 CPU.

Tesla K40 + Intel i5 Haswell give about 80 steps/s during training. 1M training + 200k evaluation steps (20k evaluation steps every 100k training steps) takes about 3.5 hours with K40.

I'd recommend about 10 GB of RAM to safely train. REPLAY_BUFFER_SIZE = 100000 and stacking 4 frames in the observation already uses 84 * 84 * 4 * 100000 = 2.6 GB RAM.

Install

  1. Clone this repo: git clone https://github.com/AdamStelmaszczyk/dqn.git.
  2. Install conda for dependency management.
  3. Create dqn conda environment: conda create -n dqn python=3.5.2 -y.
  4. Activate dqn conda environment: source activate dqn. All the following commands should be run in the activated dqn environment.
  5. Install basic dependencies: pip install -r requirements.txt.
  6. If you wish to use a GPU: pip install -r requirements-gpu.txt. Also use correct CUDA and cuDNN versions. For TensorFlow 1.4, this is CUDA 8 or 9 and cudNN 6 or 7.

There is an automatic build on Travis which does the same.

Uninstall

  1. Deactivate conda environment: source deactivate.
  2. Remove dqn conda environment: conda env remove -n dqn -y.

Usage

Basic command is neptune run --offline, which runs run.py locally. For all the possible flags check neptune.yaml.

Train

neptune run --offline -- --env Pong

There are 60 games you can choose from:

AirRaid, Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Carnival, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, ElevatorAction, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, JourneyEscape, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon

Play using the same observations as DQN

neptune run --offline -- --play True

Keys:

  • W - up
  • S - down
  • A - left
  • D - right
  • SPACE - fire button (concrete action depends on a game)

Generate GIFs

  1. Generate images: neptune run --offline -- --images True --model PONG_MODEL.h5 --env Pong.
  2. We will use convert tool, which is part of ImageMagick, here are the installation instructions.
  3. Convert images from episode 1 to GIF: convert -layers optimize-frame 1_*.png 1.gif

Best scores observed using the same hyperparameters as in the code

Pong: 21 after 0.5M steps

Breakout: 419 after 2M steps

SpaceInvaders: 1370 after 6.5M steps

BeamRider: 7111 after 5.5M steps

Seaquest: 8040 after 6.5M steps

Links

About

TensorFlow & Keras implementation of DQN with HER (Hindsight Experience Replay)

License:GNU General Public License v3.0


Languages

Language:Python 100.0%