One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

This repository is a PyTorch implementation of One Solution is Not All You Need

The DIAYN part of the code is based on this repo.

Changes:

Save and load replay buffer to enable pause / resume training
Automatic tuning of entropy alpha
Consider env rewards when training the policy

Dependencies

gym == 0.21
mujoco-py == 2.1.2.14
numpy == 1.23.3
opencv_contrib_python == 4.6.0
psutil == 5.9.2
torch == 1.12.1
tqdm == 4.64.1

Installation

pip3 install -r requirements.txt

Usage

train.sh MountainCarContinuous-v0:

python main_os.py --agent_name SACa --reward_epsilon 10000 --mem_size=100000 --env_name="$1" --n_skills=1 --do_train --auto_entropy_tuning --alpha 0.0

Reference

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL, Kumar, 2020
Diversity is All You Need: Learning Skills without a Reward Function, Eysenbach, 2018

Acknowledgment

Most of the repo is based on @alirezakazemipour implementation of DIAYN

@ben-eysenbach for sac.
@p-christ for DIAYN.py.
@johnlime for RlkitExtension.
@Dolokhow for rl-algos-tf2 .

Kelym / One-Solution-is-Not-All-You-Need