Baichenjia / BHER

Code for "Addressing Hindsight Bias in Multi-Goal Reinforcement Learning"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code for "Addressing Hindsight Bias in Multi-Goal Reinforcement Learning"

Prerequisites

Install

git clone https://github.com/Baichenjia/BHER.git
cd BHER
pip install -e .

Implemented Algorithms

Structure Overview

BHER
|-- BHER
    |-- archer
    |-- bher
    |-- her
    |-- cher
    |-- mep
  • HER/ARCHER code from the official link
  • MEP code from the official link
  • CHER code from the official link

We collect all methods in our benchmark to facilitate experimental reproducibility and to encourage adoption by other researchers.

In all methods, we use 19 CPU cores in training, each epoch contains 50 cycles, each cycle contains 40 batches, and the batch size is set to 256.

Environments

The environments are from OpenAI Gym Robotics. They are as follows:

  • FetchReach-v1
  • FetchPush-v1
  • FetchPickAndPlace-v1
  • FetchSlide-v1
  • HandReach-v0
  • HandManipulateBlockRotateZ-v0
  • HandManipulateBlockRotateParallel-v0
  • HandManipulateBlockRotateXYZ-v0
  • HandManipulateBlockFull-v0
  • HandManipulateEggRotate-v0
  • HandManipulateEggFull-v0
  • HandManipulatePenRotate-v0
  • HandManipulatePenFull-v0

Usage

Run BHER

The following command should train an agent on for 50 epochs.

cd BHER/bher/experiment/
python train.py --env_name FetchReach-v1 --logdir result/FetchReach
python train.py --env_name FetchPush-v1 --logdir result/FetchPush
python train.py --env_name FetchPickAndPlace-v1 --logdir result/FetchPickAndPlace
python train.py --env_name FetchSlide-v1 --logdir result/FetchSlide
python train.py --env_name HandManipulateBlockRotateZ-v0 --logdir result/HandManipulateBlockRotateZ
python train.py --env_name HandManipulateBlockRotateParallel-v0 --logdir result/HandManipulateBlockRotateParallel
python train.py --env_name HandManipulateBlockRotateXYZ-v0 --logdir result/HandManipulateBlockRotateXYZ
python train.py --env_name HandManipulateBlockFull-v0 --logdir result/HandManipulateBlockFull
python train.py --env_name HandManipulatePenRotate-v0 --logdir result/HandManipulatePenRotate
python train.py --env_name HandManipulatePenFull-v0 --logdir result/HandManipulatePenFull
python train.py --env_name HandManipulateEggRotate-v0 --logdir result/HandManipulateEggRotate --r_bias 0.001 --bias_clip_high 5
python train.py --env_name HandManipulateEggFull-v0 --logdir result/HandManipulateEggFull --r_bias 0.001 --bias_clip_high 5

Run Other Baselines

The following command should train an agent on "HandReach" for 50 epochs with other baseline methods.

HER

cd BHER/her/experiment/
python train.py --env_name HandReach-v0 --logdir result-her/HandReach

ARCHER

(with factors 2.0 and 1.0)

cd BHER/archer/experiment/
python train.py --env_name HandReach-v0 --logdir result-her/HandReach

MEP

cd BHER/mep/experiment/
python train.py --env_name HandReach-v0 --logdir result-mep/HandReach

CHER

cd BHER/cher/experiment/
python train.py --env_name HandReach-v0 --logdir result-cher/HandReach

Execution

Each run directory contains

  • log.txt. Monitor file by using logger from Openai Baselines.
  • params.json. All hyper-parameters used in training.
  • progress.csv. Same data as log.txt but with csv format. Using experiment/train.py to plot the learning curve.
  • total_rbias_mean.npy (option). Save the mean bias of each training batches in BHER.

Licence

The MIT License

About

Code for "Addressing Hindsight Bias in Multi-Goal Reinforcement Learning"


Languages

Language:Python 100.0%