A Pragmatic Look at Deep Imitation Learning

Imitation learning algorithms (with PPO [1]):

~~ABC [2]~~
AIRL [3]
BC [4]
DRIL [5]
FAIRL [6]
GAIL [7]
GMMIL [8]
~~nn-PUGAIL [9]~~
RED [11]

Requirements

The code runs on Python3.7 (AX requires >3.7). You can install most of the requirements by running

pip install -r requirements.txt

Notable required packages are PyTorch, OpenAI Gym, D4RL-pybullet, Hydra and Ax. if you fail to install d4rl-pybullet, install it with pip directly from git by using the command

pip install git+https://github.com/takuseno/d4rl-pybullet

Note:

For hyperparameter optimization, the code uses the Hydra AX sweeper plugin. This is not included in the requirements.txt. Ax requires a specific version of PyTorch, and therefore might upgrade/downgrade the PyTorch if you install it on existing environment. Ax sweeper can be installed with:

pip install hydra-ax-sweeper --upgrade

Run

The training of each imitation learning algorithm can be started with

python main.py algorithm=ALG/ENV

python main.py algorithm=AIRL/hopper

Hyperparameters can be found in conf/config.yaml and conf/algorithm/ALG/ENV.yaml, with the latter containing algorithm & environment specific hyperparameter that was tuned with AX.

The resulting model will saved in ./outputs/ENV_ALGO/m-d-H-M with the last subfolder indicating current date (month-day-hour-minute).

Run hyperparameter optimization

Hyper parameter optimization can be run by adding the -m flag.

example:

python main.py -m algorithm=AIRL/hopper hyperparam_opt=AIRL hydra/sweeper=ax

The hyperparam_opt specifies which parameters to optimize. (Default is IL and contains all parameters). Note that you need AX-sweeper installed for the above code to work. (See Requirement section)

Run with seeds

You can run each algorithm with different seeds with:

python main.py -m algorithm=AIRL/hopper seed=1, 2, 3, 4, 5

or use the existing bash script

./scripts/run_seed_experiments.sh ALG ENV

The results will be available in ./output/seed_sweeper_ENV_ALG folder (note: running this code twice will overwrite the previous result).

Options that can be modified in config include:

State-only imitation learning: state-only: true/false
Absorbing state indicator [12]: absorbing: true/false
R1 gradient regularisation [13]: r1-reg-coeff: 0.5 (in each algorithm subfolder)

The state only & absorbing is not used in the result.

Results

Acknowledgements

@ikostrikov for https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail

Citation

If you find this work useful and would like to cite it, the following would be appropriate:

@misc{arulkumaran2020pragmatic,
  author = {Arulkumaran, Kai and Ogawa Lillrank, Dan},
  title = {A Pragmatic Look at Deep Imitation Learning},
  url = {https://github.com/Kaixhin/imitation-learning},
  year = {2020}
}

References

[1] Proximal Policy Optimization Algorithms
[2] Adversarial Behavioral Cloning
[3] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[4] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[5] Disagreement-Regularized Imitation Learning
[6] A Divergence Minimization Perspective on Imitation Learning Methods
[7] Generative Adversarial Imitation Learning
[8] Imitation Learning via Kernel Mean Embedding
[9] Positive-Unlabeled Reward Learning
[10] Primal Wasserstein Imitation Learning
[11] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[12] Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
[13] Which Training Methods for GANs do actually Converge?

soroushmehr / imitation-learning