Leveraging Fully Observable Policies for Learning under Partial Observability
This is the repo stored the code for our paper Leveraging Fully Observable Policies for Learning under Partial Observability accepted at CoRL 2022.
@article{nguyen2022leveraging,
title={Leveraging Fully Observable Policies for Learning under Partial Observability},
author={Nguyen, Hai and Baisero, Andrea and Wang, Dian and Amato, Christopher and Platt, Robert},
journal={arXiv preprint arXiv:2211.01991},
year={2022}
}
Contents
Setup
- Install anaconda
- Create and activate environment
conda create --name cosil python=3.8.5
conda activate cosil
- Clone this repository and install required packages
git clone --recursive https://github.com/hai-h-nguyen/cosil-corl22.git
pip install -r requirements.txt
- Install domains
cd pomdp_robot_domains
pip install -r requirements.txt
pip install -e .
cd ..
cd pomdp-domains
pip install -e .
cd ..
- Install Pytorch (I used 1.12.0 for cuda 10.2 but other versions should work)
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=10.2 -c pytorch
Train
Before Training
export PYTHONPATH=${PWD}:$PYTHONPATH
Bumps-1D (Discrete Action)
- COSIL (sacde) / Behavior-Cloning (bcd) / Recurrent SAC (sacd) / Offpolicy-Advisor (sacda)
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo sacde --target_entropy 1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo sacda --target_entropy 0.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo bcd --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_1d/rnn.yml --algo sacd --target_entropy 0.7 --seed 0 --cuda 0
Bumps-2D (Discrete Action)
- COSIL (sacde) / Behavior-Cloning (bcd) / Recurrent SAC (sacd) / Offpolicy-Advisor (sacda)
python3 policies/main.py --cfg configs/pomdp/bumps_2d/rnn.yml --algo sacde --target_entropy 1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_2d/rnn.yml --algo sacda --target_entropy 0.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/bumps_2d/rnn.yml --algo bcd --seed 0 --cuda 0
python3 policies/main.py --cfg configs/mdp/bumps_2d/rnn.yml --algo sacd --target_entropy 0.7 --seed 0 --cuda 0
LunarLander-P, -V (Continuous Action)
- COSIL (sace) / Behavior-Cloning (bc) / Recurrent SAC (sac) / Offpolicy-Advisor (saca)
python3 policies/main.py --cfg configs/pomdp/lunarlander/rnn_p(rnn_v).yml --algo sace --target_entropy 1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/lunarlander/rnn_p(rnn_v).yml --algo bc/saca/sac --seed 0 --cuda 0
CarFlag (Continuous Action)
- COSIL (sace) / Behavior-Cloning (bc) / Recurrent SAC (sac) / Offpolicy-Advisor (saca)
python3 policies/main.py --cfg configs/pomdp/car_flag_continuous/rnn.yml --algo sace --target_entropy -1.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/car_flag_continuous/rnn.yml --algo sac/saca/bc --seed 0 --cuda 0
Block-Picking (Continuous Action)
- COSIL (sace) / Behavior-Cloning (bc) / Recurrent SAC (sac) / Offpolicy-Advisor (saca)
python3 policies/main.py --cfg configs/pomdp/blockpicking/rnn.yml --algo sace --target_entropy 0.0 --seed 0 --cuda 0
python3 policies/main.py --cfg configs/pomdp/blockpicking/rnn.yml --algo sac/saca/bc --seed 0 --cuda 0
Visualization using Tensorboard
tensorboard --logdir logs/folder_to_plot
License
This code is released under the MIT License.
Acknowledgments
This codebase evolved from the pomdp-baselines.