BipedalWalker-v2

BipedalWalker-v2 defines "solving" as getting average reward of 300 over 100 consecutive trials Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100. Applying motor torque costs a small amount of points, more optimal agent will get better score. State consists of hull angle speed, angular velocity, horizontal speed, vertical speed, position of joints and joints angular speed, legs contact with ground, and 10 lidar rangefinder measurements. There's no coordinates in the state vector.

A3C LSTM

This repo represents implementation with reinforcement learning using Asynchronous Advantage Actor-Critic (A3C) in Pytorch an algorithm from Google Deep Mind's paper "Asynchronous Methods for Deep Reinforcement Learning." on BipedalWalker-v2.

Requirements

Python 3.6+
Openai Gym ( refere to gym
Pytorch(depending upon your system without GPU - conda install pytorch-cpu torchvision-cpu -c pytorch)
setproctitle(pip install setproctitle)

Algorithm: Asynchronous Advantage Actor-Critic

Define Parameters

lr: 0.0001 : Number of steps in training loop gamma: 0.99 : Number by how much to check model tau: 1.0 :Number by increasing training steps seed: 1 : Random Seed for generating noice workers: 6 : Assigned cores for trainning the model num_steps: 20 : Number of steps to be taken max_episode_length: 10000 : Mac number of steps per episode env: BipedalWalker-v2 : enviroment optimizer: Adam : Optimization using adam activation function model: MLP : model used for traing enviroment stack_frames: 1 : Per grame to be stacked in training gpu_ids: [-1] : Ids assigned to use gpu

Training

While training this model please make sure you assign the worker efficiently other wise it will not take forever to train. *To train this model with 6 worker(core) on my Predator 300 helios took 3 hours to result in some good rewards.

python main.py --workers 6 --env BipedalWalker-v2 --save-max True --model MLP --stack-frames 1

Evaluation

To run a 100 episode gym evaluation with trained model

python gym_eval.py --env BipedalWalkerHardcore-v2 --num-episodes 100 --stack-frames 4 --model CONV --new-gym-eval True

Project Reference

References

Asynchronous Methods for Deep Reinforcement Learning

Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-Critic Algorithms. Nips.

Asynchronous Advantage ActorCritic with Adam Optimization and a Layer Normalized Recurrent Network JOAKIM BERGDAHL

Vision Enhanced Asynchronous Advantage Actor-Critic on Racing Games

Mnih, V., Silver, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning, 1–9.

Salimans, T., Ho, J., Chen, X., & Sutskever, I. (2017). Evolution Strategies as a Scalable Alternative to Reinforcement Learning, 1–13.

Deep Reinforcement Learning using Memory-based Approaches

Simple Reinforcement Learning with Tensorflow Part 8: Asynchronous Actor-Critic Agents (A3C)

Note

I will update the files as it progress Right now facing difficulty n training model BipedalWalker-v2 on gpu and on cpu its taling lot of time as of now the model passes the reward thresold value but can't keep it up it varies a lot with margin of 200 - -200 as of now.

ksajan / BipedalWalker-v2