Jason-CKY / pytorch_DDPG

Pytorch Implementation of DDPG on openai environments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pytorch implementation of DDPG

This is a pytorch implementation of Deep Deterministic Policy Gradients, using Ornstein–Uhlenbeck process for exploring in continuous action space while using a Deterministic policy.

Environment is provided by the openAI gym

Base environment and agent is written in RL-Glue standard, providing the library and abstract classes to inherit from for reinforcement learning experiments.

Results

Environment reward sum for each episode result
LunarLanderContinuous-v2
MountainCarContinuous-v0
Pendulum-v0
BipedalWalker-v3

DDPG Algorithm

Policy Estimation (Actor)

Actor Network consists of a 3-layer neural network taking into input the state (s) and outputs the action (a) which should be taken denoted by Pi(s). The 2 hidden layers are 400 and 300 units respectively as per paper by Lilicrap et al.

Input layer and intermediate layers uses ReLU activation function, with output layers using tanh activation function.

Policy Evaluation (Critic)

Critic Network consists of a 3-layer neural network taking into input the state (s) and corresponding action (a) and outputs the state-action value denoted by Q(s, a). The 2 hidden layers are 400 and 300 units respectively as per paper by Lilicrap et al.

Input layer and intermediate layers uses ReLU activation function.

Actor Optimization

Actor network is optimized by minimizing -Q(s, Pi(s)).mean()

Critic Optimization

Critic network is optimized by minimizing mse_loss(r+ γ*Q(s', Pi(s')) - Q(s, a))

Initialization Details

  • Replay buffer size of 1,000,000.

Critic Network

  • Weight decay = 10−2
  • Discount factor γ = 0.99
  • Soft target updates τ = 0.001
  • Final layer weights and biases initialized from uniform distribution of [−3×10−3,3×10−3]
  • ALl other layers weights initialized from uniform distributions[−1/√f,1/√f], where f is the fan-in of the layer

Actor Network

  • Final layer weights and biases initialized from uniform distribution of [−3×10−3,3×10−3]
  • ALl other layers weights initialized from uniform distributions[−1/√f,1/√f], where f is the fan-in of the layer

Dependencies

How to use

Training model for openai gym environment

git clone https://github.com/Jason-CKY/pytorch_DDPG.git
cd lunar_lander_DQN
Edit experiment parameters in main.py
 * change environment_parameters['gym_environment'] to desired environment to train in
python main.py

Testing trained model performance

python test.py
usage: test.py [-h] [--env ENV] [--checkpoint CHECKPOINT] [--gif]

optional arguments:
  -h, --help                show this help message and exit
  --env ENV                 Environment name
  --checkpoint CHECKPOINT   Name of checkpoint.pth file under model_weights/env/
  --gif                     Save rendered episode as a gif to model_weights/env/recording.gif

About

Pytorch Implementation of DDPG on openai environments


Languages

Language:Python 100.0%