Offline Reinforcement Learning for Sparse Reward Tasks

The official implementation of "On Offline Reinforcement Learning for Sparse Reward Tasks".

Installation

To reproduce reported results, please follow the steps below inside the project folder:

./install.sh

Tasks and Datasets

D4RL with artificially delayed-reward tasks and sparse reward tasks.
NeoRL with artifically delayed-reward tasks.
RecS with real-world simulated sparse reward tasks.

Run Experiments

All running scripts are placed under the scripts folder, some examples are provided below:

To run d4rl delayde-reward task:

python train_d4rl.py --algo_name=mopo --strategy=average \
--task=halfcheetah-medium-expert-v0 --delay_mode=constant --seed=10

To run d4rl sparse-reward task:

python train_d4rl.py --algo_name=mopo --strategy=average \
--task=antmaze-medium-play-v2 --delay_mode=none --seed=10

To run neorl delayed-reward task:

python train_neorl.py --algo_name=mopo --strategy=average \
 --task=Halfcheetah-v3-low-100 --delay_mode=constant --seed=10

To run recs sparse-reward task:

python train_recs.py --algo_name=mopo --strategy=average \
--task=recs-random-v0 --seed=10

Experiments Logging and Visualization

This project record the training log with Tensorboard in local directory logs/ and Wandb on website.

Reference

This project includes experiments on d4rl benchmark and neorl benchmark, our implementation based on the OfflineRL codebase for efficiency.

To cite this repository:

@misc{offlinerlsparse,
  autho = {Ritchie Huang, Kuo Li},
  title = {OfflineRLSparseReward},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/RITCHIEHuang/OfflineRLSparseReward}}
}

kiminh / OfflineRLSparseReward