This project is submitted on Robot Artificial Intelligence Class.
- Model-based Offline Policy Optimization (MOPO)
- Causal Inference (CI)
We are currently working on below setup.
- Ubuntu:
Ubuntu 20.04
- Python:
3.8.18
- Carla Simulator:
latest
- Install the Carla Simulator in CARLA official installation.
- Clone
VCCR
repository.
git clone --recursive https://github.com/brunoleej/VCCR.git
- Create a Anaconda environment and run
requirement.txt
file.
cd VCCR
conda env create -f venv_setup.yml
conda activate vccr
pip install -e viskit
pip install -e .
-
Baseline Model-based offline Policy Optimization (MOPO)
-
Currently only running locally is supported.
-
To run our baseline algorithm, the executable files at
vccr/MOPO/train_mopo_agent.ipynb
, and the expert dataset atvccr/MOPO/expert_dataset.pickle
.
-
-
Train Autonomous Driving (AD) agent in CARLA simulator.
-
Train Autonomous Navigation
-
We compared the autonomous navigation ability of VCCR against three model-free reinforcement learning (MFRL) algorithms: Deep Deterministic Policy Gradient (DDPG), Twin-delayed Deep Deterministic Policy Gradient (TD3), and Soft-Actor-Critic (SAC).
-
To train three MFRL algorithms, you can run
environment/sb_train.py
python environment/sb_train.py
-
To run on a different environment, you can modify the provided template.
For now Wandb logging is automatically running. If you don't mind you can comment processing the wandb logging line.
@article{MOPO,
title = {{MOPO:} Model-based Offline Policy Optimization},
author = {Tianhe Yu and Garrett Thomas and Lantao Yu and Stefano Ermon and James Zou and Sergey Levine and Chelsea Finn and Tengyu Ma},
journal = {CoRR},
year = {2020},
}
@article{causal-rl-survey,
title={Causal Reinforcement Learning: A Survey},
author={Deng, Zhihong and Jiang, Jing and Long, Guodong and Zhang, Chengqi},
journal={arXiv preprint arXiv:2307.01452},
year={2023}
}
The underlying Deep Deterministic Policy Gradient (DDPG), Soft-Actor-Critic (SAC), Twin-Delayed Deep Deterministic Policy Gradient (TD3) implementation in Experiment section comes from stable-baselines3 codebase.