The official implementation of "On Offline Reinforcement Learning for Sparse Reward Tasks".
To reproduce reported results, please follow the steps below inside the project folder:
./install.sh
-
D4RL with artificially delayed-reward tasks and sparse reward tasks.
-
NeoRL with artifically delayed-reward tasks.
-
RecS with real-world simulated sparse reward tasks.
All running scripts are placed under the scripts folder, some examples are provided below:
To run d4rl delayde-reward task:
python train_d4rl.py --algo_name=mopo --strategy=average \
--task=halfcheetah-medium-expert-v0 --delay_mode=constant --seed=10
To run d4rl sparse-reward task:
python train_d4rl.py --algo_name=mopo --strategy=average \
--task=antmaze-medium-play-v2 --delay_mode=none --seed=10
To run neorl delayed-reward task:
python train_neorl.py --algo_name=mopo --strategy=average \
--task=Halfcheetah-v3-low-100 --delay_mode=constant --seed=10
To run recs sparse-reward task:
python train_recs.py --algo_name=mopo --strategy=average \
--task=recs-random-v0 --seed=10
This project record the training log with Tensorboard
in local directory logs/
and Wandb on website.
This project includes experiments on d4rl benchmark and neorl benchmark, our implementation based on the OfflineRL codebase for efficiency.
To cite this repository:
@misc{offlinerlsparse,
autho = {Ritchie Huang, Kuo Li},
title = {OfflineRLSparseReward},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/RITCHIEHuang/OfflineRLSparseReward}}
}