Learning with AMIGo: Adversarially Motivated Intrinsic Goals

This is an implementation for Learning with AMIGo: Adversarially Motivated Intrinsic GOals.

The method described in the AMIGo paper listed below is implemented in monobeast/minigrid/monobeast_amigo.py of this repository. Please consult that file for details of the teacher and student policies, the losses used to train them, and other aspects of training.

The student policy is created in class MinigridNet. The teacher policy is created in class Generator. The training loop is defined in train() and is divided into act() which collects the batches generated by the actors, and learn() which updates the learner based on vtrace. Training is based on the TorchBeast implementation of IMPALA (Monobeast version).

If you have any questions or feel the code needs further clarification in the form of comments, please do not hesitate to raise an issue.

Citation

If you use AMIGo in your research and found it helpful, or are comparing against our results, please consider citing the following paper:

@article{campero2020learning,
  title={Learning with AMIGo: Adversarially Motivated Intrinsic Goals},
  author={Campero, Andres and Raileanu, Roberta and K{\"u}ttler, Heinrich and Tenenbaum, Joshua B and Rockt{\"a}schel, Tim and Grefenstette, Edward},
  journal={arXiv preprint arXiv:2006.12122},
  year={2020}
}

Installation

# create a new conda environment
conda create -n amigo python=3.7
conda activate amigo

# install dependencies
git clone git@github.com:facebookresearch/adversarially-motivated-intrinsic-goals.git
cd adversarially-motivated-intrinsic-goals
pip install -r requirements.txt

Running Experiments

Train AMIGo on MiniGrid

# Run AMIGo on MiniGrid Environment
OMP_NUM_THREADS=1 python -m monobeast.minigrid.monobeast_amigo --env MiniGrid-KeyCorridorS5R3-v0 \
--num_actors 40 --modify --generator_batch_size 150 --generator_entropy_cost .05 \
--generator_threshold -.5 --total_frames 600000000 \
--generator_reward_negative -.3 --disable_checkpoint \
--savedir ./experimentMinigrid

Please be sure to use --total_frames as in the paper:
6e8 for KeyCorridorS4R3-v0, KeyCorridorS5R3-v0, ObstructedMaze-2Dlhb-v0, ObstructedMaze-1Q-v0
3e7 for KeyCorridorS3R3 and ObstructedMaze-1Dl-v0

Train the baselines on MiniGrid

We used an open sourced implementation of the exploration baselines (i.e. RIDE, RND, ICM, and Count). This code should be pulled in a separate local repository and run within a separate environment.

# create a new conda environment
conda create -n ride python=3.7
conda activate ride 

# install dependencies
git clone git@github.com:facebookresearch/impact-driven-exploration.git
cd impact-driven-exploration
pip install -r requirements.txt

To reproduce the baseline results in the paper, run:

OMP_NUM_THREADS=1 python -m python main.py --env MiniGrid-ObstructedMaze-1Q-v0 \
--intrinsic_reward_coef 0.01 --entropy_cost 0.0001

with the corresponding best values for the --intrinsic_reward_coef and --entropy_cost reported in the paper for each model.

Set --model to ride, rnd, curiosity, or count for RIDE, RND, ICM, or Count, respectively.

Set --use_fullobs_policy for using a full view of the environment as input to the policy network.

Set --use_fullobs_intrinsic for using full views of the environment to compute the intrinsic reward.

The default uses a partial view of the environment for both the policy and the intrinsic reward.

License

The code in this repository is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC 4.0).

hany606 / adversarially-motivated-intrinsic-goals