imitation-learning offline-reinforcement-learning pytorch

Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations

This is the code for reproducing the results of the paper Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations accepted at ICML'2022. The paper can be found here.

Usage

Paper results were collected with MuJoCo 1.50 (and mujoco-py 1.50.1.1) in OpenAI gym 0.17.0 with the D4RL datasets. Networks are trained using PyTorch 1.4.0 and Python 3.6.

The paper results can be reproduced by running:

./run_dwbc.sh

You can also run DWBC on the setting used in DemoDICE and SMODICE by running main_setting_demodice.py:

python main_setting_demodice.py \
   --algorithm="DWBC" \  
   --env_e="hopper-expert-v2" \
   --env_o="hopper-random-v2" \
   --num_e=1 \  # expert trajectory num in D_e
   --num_o_e=200 \  # expert trajectory num in D_o
   --num_o_o=2000 \  # non-expert trajectory num in D_o

Bibtex

@inproceedings{xu2022discriminator,
  title     = {Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations},
  author    = {Xu, Haoran and Zhan, Xianyuan and Yin, Honglei and Qin, Huiling},
  booktitle = {Proceedings of the 39th International Conference on Machine Learning},
  pages     = {24725-24742},
  year      = {2022},
}

About

[ICML 2022] The official implementation of DWBC in "Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations"

imitation-learning offline-reinforcement-learning pytorch

MIT License

Languages

Language:Python 96.1%Language:Shell 3.9%