BipedalWalkerTD3

This project is my own implementation of TD3 paper for solving continuous control problem in the Bipedal Walker environment of OpenAI Gym.

Overview

Twin Delayed Deep Deterministic Policy Gradient (TD3) is an Actor Critic reinforcement learning algorithm. This algorithm is an improved version of Deep Deterministic Policy Gradient. It uses 2 critic networks to limit the overestimation bias. Further, it uses delayed policy and target networks updates to reduce per-update error.

Requirements

This project requires following dependencies:

Python
Tensorflow
Numpy
OpenAI Gym
Jupyter Notebook (optional)

Trying Out

For trying out, download the weights from repository and put them in the same folder. Run all cells except the Training cell with comment #Training on top.

HimGautam / BipedalWalkerTD3

BipedalWalkerTD3

Overview

Requirements

Trying Out

About

Languages