The TRPO part is hugely based on: https://github.com/ikostrikov/pytorch-trpo
- Python 3.6
- PyTorch 0.4.1
- gym
- mujoco
- numpy
- scipy
The .py files take trajectories and confidence data as inputs (in demonstrations folder) and record accumulated reward at each update in the log folder.
Please follow below commands to run our methods and baselines. Traj-size option is the same as specifying
- IC_GAIL
python IC_GAIL.py --env Ant-v2 --num-epochs 5000 --traj-size 600
- 2IWIL
python 2IWIL.py --env Ant-v2 --num-epochs 5000 --traj-size 600 --weight
- GAIL (U+C)
python 2IWIL.py --env Ant-v2 --num-epochs 5000 --traj-size 600
- GAIL (C)
python 2IWIL.py --env Ant-v2 --num-epochs 5000 --traj-size 600 --weight --only --noconf
- GAIL (Reweight)
python 2IWIL.py --env Ant-v2 --num-epochs 5000 --traj-size 600 --weight --only