TensorFlow & Keras implementation of DQN with HER (Hindsight Experience Replay)
If TensorFlow finds a GPU you will see Creating TensorFlow device (/device:GPU:0)
in the beginning of log
and the code will use 1 GPU + 1 CPU. If it doesn't find a GPU, it will use 1 CPU.
Tesla K40 + Intel i5 Haswell give about 80 steps/s during training. 1M training + 200k evaluation steps (20k evaluation steps every 100k training steps) takes about 3.5 hours with K40.
I'd recommend about 10 GB of RAM to safely train. REPLAY_BUFFER_SIZE = 100000 and stacking 4 frames in the observation already uses 84 * 84 * 4 * 100000 = 2.6 GB RAM.
- Clone this repo:
git clone https://github.com/AdamStelmaszczyk/dqn.git
. - Install
conda
for dependency management. - Create
dqn
conda environment:conda create -n dqn python=3.5.2 -y
. - Activate
dqn
conda environment:source activate dqn
. All the following commands should be run in the activateddqn
environment. - Install basic dependencies:
pip install -r requirements.txt
. - If you wish to use a GPU:
pip install -r requirements-gpu.txt
. Also use correct CUDA and cuDNN versions. For TensorFlow 1.4, this is CUDA 8 or 9 and cudNN 6 or 7.
There is an automatic build on Travis which does the same.
- Deactivate conda environment:
source deactivate
. - Remove
dqn
conda environment:conda env remove -n dqn -y
.
Basic command is neptune run --offline
, which runs run.py
locally. For all the possible flags check neptune.yaml
.
neptune run --offline -- --env Pong
There are 60 games you can choose from:
AirRaid, Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Carnival, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, ElevatorAction, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, JourneyEscape, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon
neptune run --offline -- --play True
Keys:
- W - up
- S - down
- A - left
- D - right
- SPACE - fire button (concrete action depends on a game)
- Generate images:
neptune run --offline -- --images True --model PONG_MODEL.h5 --env Pong
. - We will use
convert
tool, which is part of ImageMagick, here are the installation instructions. - Convert images from episode 1 to GIF:
convert -layers optimize-frame 1_*.png 1.gif
- https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
- https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf
- https://becominghuman.ai/lets-build-an-atari-ai-part-1-dqn-df57e8ff3b26
- https://blog.openai.com/openai-baselines-dqn
- https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/
- https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py
- https://github.com/openai/gym/blob/master/gym/envs/__init__.py#L483
- https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner
- https://medium.com/mlreview/speeding-up-dqn-on-pytorch-solving-pong-in-30-minutes-81a1bd2dff55
- https://arxiv.org/pdf/1707.01495.pdf