Asynchronous deep reinforcement learning
An attempt to repdroduce Google Deep Mind's paper "Asynchronous Methods for Deep Reinforcement Learning."
Asynchronous Advantage Actor-Critic (A3C) method for playing "Atari Pong" is implemented with TensorFlow. Both A3C-FF and A3C-LSTM are implemented.
Learning result movment after 26 hours (A3C-FF) is like this.
Any advice or suggestion is strongly welcomed in issues thread.
How to build
First we need to build multi thread ready version of Arcade Learning Enviroment. I made some modification to it to run it on multi thread enviroment.
$ git clone https://github.com/miyosuda/Arcade-Learning-Environment.git $ cd Arcade-Learning-Environment $ cmake -DUSE_SDL=ON -DUSE_RLGLUE=OFF -DBUILD_EXAMPLES=OFF . $ make -j 4 $ pip install .
I recommend to install it on VirtualEnv environment.
How to run
To display the result with game play,
To enable gpu, change "USE_GPU" flag in "constants.py".
When running with 8 parallel game environemts, speeds of GPU (GTX980Ti) and CPU(Core i7 6700) were like this. (Recorded with LOCAL_T_MAX=20 setting.)
|GPU||1722 steps per sec||864 steps per sec|
|CPU||1077 steps per sec||540 steps per sec|
Score plots of local threads of pong were like these. (with GTX980Ti)
A3C-LSTM LOCAL_T_MAX = 5
A3C-LSTM LOCAL_T_MAX = 20
Scores are not averaged using global network unlike the original paper.
- TensorFlow r1.0
This project uses setting written in muupan's wiki [muuupan/async-rl] (https://github.com/muupan/async-rl/wiki)
- @aravindsrinivas for providing information for some of the hyper parameters.