4SkyNet / POP3D

Policy Optimization with Penalized Point Probability Distance: an Alternative to Proximal Policy Optimization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

POP3D

Source code(Tensorflow)for Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization (https://arxiv.org/abs/1807.00442v2)

Prerequisite

  • gym[mujoco,atari]
  • scipy
  • tqdm
  • joblib
  • zmq
  • dill
  • mpi4py
  • cloudpickle
  • tensorflow>=1.4.0
  • opencv-python

Training

If you desire to run all games(49 Atari or 7 Mujoco) using 2 GPUs( You can change the task distribution work based on GPU resource), you can do as follows.

  • Atari
python -m baselines.ppo2.run_all_atari
  • Mujoco
python -m baselines.ppo2.run_all_mujoco

If you want to train only one game, take Atari Alien using seed 10 for example

  • Use PPO
python -m baselines.ppo2.run_atari --env AlienNoFrameskip-v4  --num-timesteps 10000000 --seed 10
  • Use POP3D
python -m baselines.ppo2.run_atari --env AlienNoFrameskip-v4  --num-timesteps 10000000 --seed 10 --use-penal 1

Results

You can download results on three seeds from google drive https://drive.google.com/file/d/1c79TqWn74mHXhLjoTWaBKfKaQOsfD2hg/view?usp=sharing. We release it to make reproduction of this paper easy.

Atari results

Atari

Mujoco results

Atari

Acknowledge

Thanks to OpenAI's baselines, our code is based on https://github.com/openai/baselines.git

About

Policy Optimization with Penalized Point Probability Distance: an Alternative to Proximal Policy Optimization


Languages

Language:Python 100.0%