DAMIIRL

This repository is the implimention of the paper:

Deep Adaptive Multi-Intention Inverse Reinforcement Learning
Ariyan Bighashdel, Panagiotis Meletis, Pavol Jancura, Gijs Dubbelman
Accepted for presentation at ECML PKDD 2021

In this paper, two algorithms, namely "SEM-MIIRL" and "MCEM-MIIRL" are developed which can learn an a priori unknown number of nonlinear reward functions from unlabeled experts' demonstrations. The algorithms are evaluated on two proposed environments, namely "M-ObjectWorld" and "M-BinaryWorld". The proposed algorithms, and the environments are implemented in this repository.

If you find this code useful in your research then please cite

@inproceedings{gupta2018social,
  title={Deep Adaptive Multi-Intention Inverse Reinforcement Learning},
  author={Bighashdel, Ariyan and Meletis, Panagiotis and Jancura, Pavol and Dubbelman, Gijs},
  booktitle={Springer in the Lecture Notes in Computer Science Series (LNCS)},
  number={CONF},
  year={2020}
}

Dependencies

The code is developed and tested on Ubuntu 18.04 with Python 3.6 and PyTorch 1.9.

You can install the dependencies by running:

pip install -r requirements.txt   # Install dependencies

Implimentation of "Deep Adaptive Multi-intention Inverse Reinforcement Learning"

Training

A simple experiment with a default set of parameters can be done by running:

python3 main.py

The following paramters are defined which can be set for various experiments in main.py:

miirl_type: the main algorithm which can be either 'SEM' or 'MCEM', where 'SEM' : SEM-MIIRL and 'MCEM' : MCEM-MIIRL
game_type: the environment which can be either 'ow' or 'bw', where 'ow' : M-ObjectWorld and 'bw' : M-BinaryWorld
sample_length: the length of each demonstration sample
alpha: the concentration parameter
sample_size: the number of demonstrations for each reward/intention
rewards_types: the intention/reward types which are in total six, ['A','B','C','D','E','F']
mirl_maxiter: the maximum number of iterations

Experiment

We conduct an experiment by setting the paramters as:

miirl_type = 'SEM'
game_type = 'ow'
sample_length = 8
alpha = 1
sample_size = 16
rewards_types = ['A','B']
mirl_maxiter = 200

The following picture shows the true and predicted rewards:

safarzadeh-reza / damiirl

DAMIIRL

Dependencies

Training

Experiment

About

Languages