[Project Page] Official code base for Residual Q-Learning: Offline and Online Policy Customization without Value, NeurIPS 2023
pip install swig cmake ffmpeg
pip install -r requirements.txt
conda create --name <env_name> python=3.9
conda activate <env_name>
pip install swig cmake ffmpeg
pip install -r requirements.txt
Run the script train_rl.py
with arguments specifying the ennvironment, algorithm, and the number of expert rollouts collected in the final iteration which will be used for GAIL training. For example:
python train_rl.py --env-name highway_basic --algo dqn_ME --rollout-save-n-episodes 1000
python train_rl.py --env-name parking_basic --algo sac --rollout-save-n-episodes 10000
python train_rl.py --env-name cartpole_basic --algo dqn_ME --rollout-save-n-episodes 1000
python train_rl.py --env-name mountain_car_basic --algo sac --rollout-save-n-episodes 10000
The experiment will be logged in the directory in the format of logs/{algo}/{env-name}_{timestamp}/
. The policies are stored in the subfolder policies
and the final rollouts are stored in the subfolder rollouts
.
Run the script train_gail.py
with arguments specifying the ennvironment, generator training algorithm, and the directory of the demo rollouts. For example:
python train_gail.py --env-name parking_basic --gen-algo sac --demo-rollout-path EXPERT_DIRECTORY/rollouts/final.npz
Run the script train.py
with arguments specifying the environment, generator training algorithm, and prior model path. We provide pretrained RL prior agents for the environments we have tested. Run the following commands to train the customized agents with the pretrained RL prior agents.
python -W ignore train.py --algo dqn_soft_residual --env highway-ME-basic-AddRightReward-v0 --prior-model-path ./logs/highway-ME-basic-v0_Example_Pretrained
python -W ignore train.py --algo sac_residual --env parking-basic-boundary-v0 --prior-model-path ./logs/parking-basic-v0_Example_Pretrained
python -W ignore train.py --algo dqn_soft_residual --env CartPole-modifed-morecenter-v1 --prior-model-path ./logs/CartPole-modifed-v1_Example_Pretrained
python -W ignore train.py --algo sac_residual --env MountainCarContinuous-modifed-lessleft-v0 --prior-model-path ./logs/MountainCarContinuous-modifed-v0_Example_Pretrained
Run the script train_rl.py
. The environments correspond to the tasks with total rewards are specified by the named_config cartpole_total
, mountain_car_total
, highway_total
, and parking_total
.
We also provide a demo notebook Training_demo_Highway.ipynb
for the our experiments on the highway-env
environment, which includes the training and evaluation of RL prior and customized policies. It also includes a demo of maximum-entropy MCTS for zero-shot online customization.
@inproceedings{li2023residual,
title={Residual Q-Learning: Offline and Online Policy Customization without Value},
author={Li, Chenran and Tang, Chen and Nishimura, Haruki and Mercat, Jean and Tomizuka, Masayoshi and Zhan, Wei},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023}
}
Please feel free to email us if you have any questions.
Chenran Li (chenran_li@berkeley.edu)
Chen Tang (chen_tang@berkeley.edu)