soroushmehr / imitation-learning

Imitation learning algorithms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Pragmatic Look at Deep Imitation Learning

MIT License

Imitation learning algorithms (with PPO [1]):

Requirements

The code runs on Python3.7 (AX requires >3.7). You can install most of the requirements by running

pip install -r requirements.txt

Notable required packages are PyTorch, OpenAI Gym, D4RL-pybullet, Hydra and Ax. if you fail to install d4rl-pybullet, install it with pip directly from git by using the command

pip install git+https://github.com/takuseno/d4rl-pybullet

Note:

For hyperparameter optimization, the code uses the Hydra AX sweeper plugin. This is not included in the requirements.txt. Ax requires a specific version of PyTorch, and therefore might upgrade/downgrade the PyTorch if you install it on existing environment. Ax sweeper can be installed with:

pip install hydra-ax-sweeper --upgrade

Run

The training of each imitation learning algorithm can be started with

python main.py algorithm=ALG/ENV

ALG one of [AIRL|BC|DRIL|FAIRL|GAIL|GMMIL|PUGAIL|RED] and ENV to be one of [ant|halfcheetah|hopper|walker2d]. example:

python main.py algorithm=AIRL/hopper

Hyperparameters can be found in conf/config.yaml and conf/algorithm/ALG/ENV.yaml, with the latter containing algorithm & environment specific hyperparameter that was tuned with AX.

The resulting model will saved in ./outputs/ENV_ALGO/m-d-H-M with the last subfolder indicating current date (month-day-hour-minute).

Run hyperparameter optimization

Hyper parameter optimization can be run by adding the -m flag.

example:

python main.py -m algorithm=AIRL/hopper hyperparam_opt=AIRL hydra/sweeper=ax

The hyperparam_opt specifies which parameters to optimize. (Default is IL and contains all parameters). Note that you need AX-sweeper installed for the above code to work. (See Requirement section)

Run with seeds

You can run each algorithm with different seeds with:

python main.py -m algorithm=AIRL/hopper seed=1, 2, 3, 4, 5 

or use the existing bash script

./scripts/run_seed_experiments.sh ALG ENV

The results will be available in ./output/seed_sweeper_ENV_ALG folder (note: running this code twice will overwrite the previous result).

Options that can be modified in config include:

  • State-only imitation learning: state-only: true/false
  • Absorbing state indicator [12]: absorbing: true/false
  • R1 gradient regularisation [13]: r1-reg-coeff: 0.5 (in each algorithm subfolder)

The state only & absorbing is not used in the result.

Results

all_training_result

Acknowledgements

Citation

If you find this work useful and would like to cite it, the following would be appropriate:

@misc{arulkumaran2020pragmatic,
  author = {Arulkumaran, Kai and Ogawa Lillrank, Dan},
  title = {A Pragmatic Look at Deep Imitation Learning},
  url = {https://github.com/Kaixhin/imitation-learning},
  year = {2020}
}

References

[1] Proximal Policy Optimization Algorithms
[2] Adversarial Behavioral Cloning
[3] Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[4] Efficient Training of Artificial Neural Networks for Autonomous Navigation
[5] Disagreement-Regularized Imitation Learning
[6] A Divergence Minimization Perspective on Imitation Learning Methods
[7] Generative Adversarial Imitation Learning
[8] Imitation Learning via Kernel Mean Embedding
[9] Positive-Unlabeled Reward Learning
[10] Primal Wasserstein Imitation Learning
[11] Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
[12] Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
[13] Which Training Methods for GANs do actually Converge?

About

Imitation learning algorithms

License:MIT License


Languages

Language:Python 96.6%Language:Shell 3.4%